Apache Drill – Executing sample SQL Queries on JSON Data

Motivation for the post came from a friend who was having trouble finding way to use Drill with JSON Data source. So here comes a crisp post on the same.

Start Sqlline prompt

switch to the Drill root directory and start sqlline prompt by the given command:

$ cd incubator-drill
$ /incubator-drill$ ./sqlline
sqlline>

Provide donuts.json to donuts-model.json

Our JSON data is present in the file donuts.json, which is located in the location:

/incubator-drill/exec/ref/src/test/resources/donuts.json

 

We need to provide this file to the donuts-model.json file which would be used by optiq to connect to the json data source. This can be done by any of the two ways:

1. Copy the donuts.json file to /incubator-drill/
$ cp exec/ref/src/test/resources/donuts.json ./donuts.json
Or,
2. Provide the path to donuts.json inside donuts-model.json file. Edit the file and add the ‘path’ property.

Connect to JSON data model

We are done with the pre-requisites and now we can switch back to our sqlline rompt to fire some queries on out JSON Data. Below is the connect command to connect to the json data source:

sqlline> !connect jdbc:optiq:model=common/target/test-classes/donuts-model.json admin admin
 Connected to: Optiq (version 0.4.10)
 Driver: Optiq JDBC Driver (version 0.4.10)
 Autocommit status: true
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 0: jdbc:optiq:model=common/target/test-classe>

Execute sample SQL queries

Lets play around with few SQL Queries on Drill:

0: jdbc:optiq:model=common/target/test-classe> select _MAP['name'] as name, _MAP['type'] as type from donuts;

 RunOutcome [outcome=SUCCESS, bytes=0, records=5, exception=null]
 +----------------+--------+
 | NAME           | TYPE   |
 +----------------+--------+
 | Cake           | donut  |
 | Raised         | donut  |
 | Old Fashioned  | donut  |
 | Filled         | donut  |
 | Apple Fritter  | donut  |
 +----------------+--------+
 5 rows selected (0.15 seconds)

 

And few more complex Queries:

SELECT ppu, count(1) AS cnt FROM (SELECT _MAP['ppu'] AS ppu, _MAP['type'] AS type FROM donuts)y GROUP BY ppu ORDER BY ppu DESC;
RunOutcome [outcome=SUCCESS, bytes=0, records=3, exception=null]
 +-------+------+
 | PPU   | CNT  |
 +-------+------+
 | 1.0   | 1    |
 | 0.69  | 2    |
 | 0.55  | 2    |
 +-------+------+
 3 rows selected (0.862 seconds)

NOTE: _MAP has to be used to reference fields of data for now.

Thats all for this post. Hope it was helpful. Do not forget to visit: http://incubator.apache.org/drill/index.html#get_involved. Cheers \m/

Yash Sharma is a Big Data & Machine Learning Engineer, A newbie OpenSource contributor, Plays guitar and enjoys teaching as part time hobby.
Talk to Yash about Distributed Systems and Data platform designs.

5 thoughts on “Apache Drill – Executing sample SQL Queries on JSON Data

  1. Hi,
    Thx for this article. I am new to Drill, got it installed and verified that I am able to query the sample parquet file.
    I am trying to query a json file now. I am not using optiq here.

    I added a new json file under sample-data dir:
    abishek@abishek-comp:/opt/drill/apache-drill-1.0.0-m1-incubating/sample-data$ cat test.json
    {“name”:”abishek”,”msg”:”hello world”}
    {“name”:”cp”,”msg”:”hello blr”}
    {“name”:”mr”,”msg”:”hello us”}

    I started my standalone dril like:
    /opt/drill/apache-drill-1.0.0-m1-incubating$ sudo ./sqlline -u jdbc:drill:schema=json-local -n admin -p admin

    I fired a query like:
    SELECT * FORM “sample-data/test.json”;

    However the json file is not getting detected. Looks like I have to register this source but I am not sure how. Not able to find documentation on how to do it on a embedded mode run without zk. Could you help find what am I missing here?

  2. Hi, I installed drill as per ur post…
    now when I try to run drill using “./sqlline” command am getting this error
    “bash: ./sqlline: No such file or directory”

    is there anything leftover I need to do.?

Leave a Reply

Your email address will not be published. Required fields are marked *