Query S3 data via Hive on local box

In the last post we discussed about how to generate synthetic data. Here we will talk about how to query S3 data via Hive.

Provide AWS configuration to Hadoop and Hive

We need to add the following configuration to the Hadoop  and Hive config files.

hive-site.xml

You can find hive-site.xml in HIVE_HOME.

You can find all this file in HADOOP_HOME.

core-site.xml

mapred-site.xml

hdfs-site.xml

Add Hadoop Env variable

Run Hive

First we will run hive on local system via console.

$ source ~/.profile
$ hstart
$ hive

while running Hive, make sure Hadoop is running in the background.

Create Hive Table

Here we will create table using data stored in S3 bucket.

 

Checkout my portfolio here: https://confusedcoders.com/nikita-sharma-greenhorn-data-science-student

I am a greenhorn Data Science student with interest in finding patterns in data. My language of choice is Python and I am starting to get my hands dirty with R.

I blog on Medium.com [1] and ConfusedCoders.com [2]. I share my code on Github.com [3].

  1.  https://medium.com/@nikkisharma536
  2. https://confusedcoders.com/author/nikita
  3. https://github.com/nikkisharma536

Leave a Reply

Your email address will not be published. Required fields are marked *