How to connect/query Hive metastore on EMR cluster

Just Look for the hive config file –

On EMR emr-4.7.2 it is here –

less /etc/hive/conf/hive-site.xml

Look for the below properties in the hive-site

<property>
 <name>javax.jdo.option.ConnectionURL</name>
 <value>jdbc:mysql://ip-xx-xx-xx-xx:3306/hive?createDatabaseIfNotExist=true</value>
 <description>username to use against metastore database</description>
</property>

<property>
 <name>javax.jdo.option.ConnectionUserName</name>
 <value>hive</value>
 <description>username to use against metastore database</description>
</property>

<property>
 <name>javax.jdo.option.ConnectionPassword</name>
 <value>xxxxxxxxxxx</value>
 <description>password to use against metastore database</description>
</property>

 

Use your fav sql query editor to connect to hive. Thats how it would look on Intellij IDEA.

screen-shot-2016-09-20-at-10-53-48-am

show databases;

use information_schema;
show tables;

use hive;
-- This shows the hive meta store version --
select * from VERSION;

1,0.14.0,Hive release version 0.14.0

Or to access mysql directly –

mysql -h ip-xx.xx.xx.xx -P 3306 -u hive -p


mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| innodb             |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
6 rows in set (0.23 sec)

mysql> use hive;
Database changed
mysql> show tables;
+---------------------------+
| Tables_in_hive            |
+---------------------------+
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| COMPACTION_QUEUE          |
| COMPLETED_TXN_COMPONENTS  |
| DATABASE_PARAMS           |
| DBS                       |
| DB_PRIVS                  |
....

mysql> select * from PARTITIONS limit 20;
+---------+-------------+------------------+---------------------------------------------+-------+--------+----------------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME                                   | SD_ID | TBL_ID | LINK_TARGET_ID |
+---------+-------------+------------------+---------------------------------------------+-------+--------+----------------+
|       1 |  1474441844 |                0 | year=2016/month=2016-02/utc_date=2016-02-01 |     6 |      1 |           NULL |
|       2 |  1474441844 |                0 | year=2016/month=2016-02/utc_date=2016-02-02 |     7 |      1 |           NULL |
|       3 |  1474441844 |                0 | year=2016/month=2016-02/utc_date=2016-02-03 |     8 |      1 |           NULL |
|       4 |  1474441844 |                0 | year=2016/month=2016-02/utc_date=2016-02-04 |     9 |      1 |           NULL |
....

Setting up metastore outside the EMR cluster-

We need to set the above mentioned configs to hive-site.xml while creating the emr cluster and it should point to the new metastore.

 

Errors down the line –
1. emr creation Error hive bin schematool dbType mysql initSchema verbose returned 1 instead of one of [0]

Fix : Check the connectionUrl, It usually has junk characters or is missing the jdbc:mysql://

javax.jdo.option.ConnectionURL

2 thoughts on “How to connect/query Hive metastore on EMR cluster”

  1. javax.jdo.option.ConnectionURL
    jdbc:mysql://ip-xx-xx-xx-xx:3306/hive?createDatabaseIfNotExist=true
    username to use against metastore database

    this property in my master cluster contain a private ip… 10.10.10.X
    should i replace it with the public ip of the master ?
    should i do a restart of the cluster ?

Leave a Reply

Your email address will not be published. Required fields are marked *