Spark-sql java.net.NoRouteToHostException on cluster reboot

We had a EMR cluster reboot and hit this error all of sudden. The error is independent of EMR so worth sharing. Error: Caused by: java.net.NoRouteToHostException: No route to host at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) at org.apache.hadoop.ipc.Client.call(Client.java:1451) … 56 more java.net.NoRouteToHostException: …

More

How to connect/query Hive metastore on EMR cluster

Just Look for the hive config file – On EMR emr-4.7.2 it is here – less /etc/hive/conf/hive-site.xml Look for the below properties in the hive-site <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://ip-xx-xx-xx-xx:3306/hive?createDatabaseIfNotExist=true</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>xxxxxxxxxxx</value> <description>password to use against metastore database</description> </property>   Use …

More

How to get the Hive metastore version on EMR cluster

Quick note – $ /usr/lib/hive/bin/schematool -dbType mysql -info Metastore connection URL: jdbc:mysql://ip-XX.XX.XX.XX:3306/hive?createDatabaseIfNotExist=true Metastore Connection Driver : org.mariadb.jdbc.Driver Metastore connection User: hive Hive distribution version: 0.14.0 Metastore schema version: 0.14.0 schemaTool completed Yash SharmaYash Sharma is a Big Data & Machine Learning Engineer, A newbie OpenSource contributor, Plays guitar and enjoys teaching as part time hobby. …

More

Debugging : Hive Dynamic partition Error : [Fatal Error] total number of created files now is 100028, which exceeds 100000. Killing the job.

[Fatal Error] total number of created files now is 900320, which exceeds 900000. Killing the job. tldr; quick fix – but probably not the right thing to do always: SET hive.exec.max.created.files=900000; So my config increases the default partitions and files created limit: set hive.exec.dynamic.partition=true; set hive.exec.max.dynamic.partitions=100000; SET hive.exec.max.dynamic.partitions.pernode=100000; set hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.max.created.files=900000; Correct thing to do: …

More

Use Hive Serde for Fixed Length (index based) strings

Hive fixed length serde can be used in scenarios where we do not have any delimiters in out data file. Using RegexSerDe for fixed length strings is pretty straight: CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” ) LOCATION ‘path/to/data’; The above query …

More

Hive – Selected data import/query – Files and folders (mapred.input.dir.recursive)

Data import in Hive by default expects a directory name in its query specified by LOCATION keyword. By default Hive picks up all the files from the dir and imports into itself. If the directory does not contain files, rather consists of sub directories Hive blows up with the exception: java.io.IOException:java.io.IOException: Not a file: /path/to/data/* …

More

Integrating Hive 0.9.0 with HBase 0.94.3 – Identifying root cause for RuntimeException: Error while reading from task log url

The last post here was on integrating Hive 0.11.0 with HBase 0.94.2. But because of issue HIVE-4515 currently we are not able to query HBase with varied queries. While the contributors are fixing the issue we can use HBase 0.94.3 for our experiments. The Above posts has details on configuring Hive with HBase and table …

More

HBase Hive integration – Querying HBase via Hive

There is a cool post here on Apache wiki : HBase Hive integration .This post is a simplified compilation of the same. Hive: 0.11.0 HBase: 0.94.2 Hadoop: 0.20.2 Create HBase table create ‘hivehbase’, ‘ratings’ put ‘hivehbase’, ‘row1’, ‘ratings:userid’, ‘user1’ put ‘hivehbase’, ‘row1’, ‘ratings:bookid’, ‘book1’ put ‘hivehbase’, ‘row1’, ‘ratings:rating’, ‘1’ put ‘hivehbase’, ‘row2’, ‘ratings:userid’, ‘user2’ put …

More