Spark-sql java.net.NoRouteToHostException on cluster reboot

We had a EMR cluster reboot and hit this error all of sudden. The error is independent of EMR so worth sharing.

Error:

Caused by: java.net.NoRouteToHostException: No route to host
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
 at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
 at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
 at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
 at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
 at org.apache.hadoop.ipc.Client.call(Client.java:1451)
 ... 56 more
 java.net.NoRouteToHostException: No Route to Host from ip-XXX-XXX-XXX-XXX/XXX-XXX-XXX-XXX to ip-YYY-YYY-YYY-YYY:PORT failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
 at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:758)
 at org.apache.hadoop.ipc.Client.call(Client.java:1479)
 at org.apache.hadoop.ipc.Client.call(Client.java:1412)
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
 at com.sun.proxy.$Proxy14.getListing(Unknown Source)
 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:573)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)

Note: ip-XXX-XXX-XXX-XXX was our new cluster master’s IP while ip-YYY-YYY-YYY-YYY was the old cluster’s master’s IP (which had been terminated now).

Root cause:

We had an external Metastore for the cluster so that we could get rid of the cluster and spin up a new one anytime.  Hive Metastore still keeps references to old cluster if there are ‘MANAGED’ tables.

Fixes:

  1. Drop all managed tables, since the data is lost with old cluster.
  2. Remove/update references to old cluster from Metastore. This is not very useful, but it was good knowing that it can be done.
$ hive --service metatool -listFSRoot

 Initializing HiveMetaTool..
 Listing FS Roots..
 hdfs://ip-YYY.YYY.YYY.YYY:PORT/user/hive/warehouse
 hdfs://ip-YYY.YYY.YYY.YYY:PORT/user/hive/warehouse/products.db
 hive --config /etc/hive/conf/conf.server --service metatool -dryRun
 -updateLocation <new_value> <old_value>

$ hive --service metatool
 -updateLocationhdfs://ip-XXX.XXX.XXX.XXX:PORT/user/hive/warehousehdfs://ip-YYY.YYY.YYY.YYY:PORT/user/hive/warehouse

Final notes:

This command is insanely slow and takes hours depending on the number of partitions and tables in your Metastore. The command looks for all the tables and updates the references to old locations.

More info in this very useful post. Hope that helps.

Cheers

Yash Sharma is a Big Data & Machine Learning Engineer, A newbie OpenSource contributor, Plays guitar and enjoys teaching as part time hobby.
Talk to Yash about Distributed Systems and Data platform designs.

Leave a Reply

Your email address will not be published. Required fields are marked *