HBase Hive integration – Querying HBase via Hive

There is a cool post here on Apache wiki : HBase Hive integration .This post is a simplified compilation of the same.

Hive: 0.11.0
HBase: 0.94.2
Hadoop: 0.20.2

Create HBase table

create 'hivehbase', 'ratings'
put 'hivehbase', 'row1', 'ratings:userid', 'user1'
put 'hivehbase', 'row1', 'ratings:bookid', 'book1'
put 'hivehbase', 'row1', 'ratings:rating', '1'

put 'hivehbase', 'row2', 'ratings:userid', 'user2'
put 'hivehbase', 'row2', 'ratings:bookid', 'book1'
put 'hivehbase', 'row2', 'ratings:rating', '3'

put 'hivehbase', 'row3', 'ratings:userid', 'user2'
put 'hivehbase', 'row3', 'ratings:bookid', 'book2'
put 'hivehbase', 'row3', 'ratings:rating', '3'

put 'hivehbase', 'row4', 'ratings:userid', 'user2'
put 'hivehbase', 'row4', 'ratings:bookid', 'book4'
put 'hivehbase', 'row4', 'ratings:rating', '1'

Check table data by using: scan ‘tablename’.

Provide necessary jars to Hive

Create a folder named auxlib in Hive root directory and put the following jars in it. The jars can be found in HBase lib directory. Hive-HBase handler is present in Hive lib directory.

  • Guava
  • Hive-Hbase handler
  • HBase
  • Zookeeper

Hive-Hbase jars

Create Hive external table

Since we want the data to be fetched from HBase we would need to create an external table for Hive.

CREATE EXTERNAL TABLE hbasehive_table
(key string, userid string,bookid string,rating int) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES 
("hbase.columns.mapping" = ":key,ratings:userid,ratings:bookid,ratings:rating")
TBLPROPERTIES ("hbase.table.name" = "hivehbase");

Note : the first column must be the key column which would also be same as the HBase’s key column. Also if you have some numeric data columns you can use the ratings:rating#b in the HBase column mappings.

Querying HBase via Hive

hive> select * from hbasehive_table;     
OK
row1	user1	book1	1
row2	user2	book1	3
row3	user2	book2	3
row4	user2	book4	1
Time taken: 0.254 seconds, Fetched: 4 row(s)

Few exceptions Along the path

  • java.lang.NoClassDefFoundError: com/google/protobuf/Message
  • java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.HbaseObjectWritable

The above two issues occur when Hive is unable to find required jars in its path. Adding above 4 jars in the auxlib directory fixes this issue.

  • java.lang.NegativeArraySizeException
  • SELECT COUNT(*) FROM hbasehive_table; fails

The above is an open issue for Hive 0.10 & 0.11 integration with HBase 0.94. Refer to the below JIRA’s for more information:

Thats all for this post. Drop in a comment in case you face any issues. Have Fun.

Cheers.

Yash Sharma is a Big Data & Machine Learning Engineer, A newbie OpenSource contributor, Plays guitar and enjoys teaching as part time hobby.
Talk to Yash about Distributed Systems and Data platform designs.

8 thoughts on “HBase Hive integration – Querying HBase via Hive

  1. Thank you for the above post, it helped a lot. We loaded data to hbase but wanted to use hive for joins. This post helped me find the solution real quick. Keep posting!
    Cheers!

  2. I can create an hbase table through hive and also create external tables mapped with hbase tables.
    In the case of external tables, I can also read the existing data.
    My problem is, whenever I try to insert new rows whether to a local or an external table I get the following exception:

    FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.tez.TezTask. org/apache/hadoop/hbase/CompatibilityFactory

  3. Thanks for fantastic post.
    I am able to load data into Hive, but I have one query “after loading data into Hive. I have updated few column values using ‘Put’ and inserted new keys, but when I re-ran select * from Hive_table, its not reflecting”

    Could you please help how to hander it.

  4. Hi,
    I am facing some issue while making 2 different Hive tables on one single Hbase table having 2 column families. Both column families mapped to respective hive table. e.g. col-family1 mapped to hive-table1 and col-family2 mapped to hive-table2.
    Now whenever i execute a select count(*) query on hive-table1 it returns count of all the records present in Hbase table irrespective of which col-family it is mapped to.
    Can anyone help me resolve this problem?

  5. its a really great way to start with hive and hbase intergration …Thanks in bundle!!!!…looking forward for more such post.

Leave a Reply

Your email address will not be published. Required fields are marked *