Motivation for the post came from a friend who was having trouble finding way to use Drill with JSON Data source. So here comes a crisp post on the same.
Start Sqlline prompt
switch to the Drill root directory and start sqlline prompt by the given command:
$ cd incubator-drill
$ /incubator-drill$ ./sqlline
sqlline>
Provide donuts.json to donuts-model.json
Our JSON data is present in the file donuts.json, which is located in the location:
/incubator-drill/exec/ref/src/test/resources/donuts.json
We need to provide this file to the donuts-model.json file which would be used by optiq to connect to the json data source. This can be done by any of the two ways:
1. Copy the donuts.json file to /incubator-drill/
$ cp exec/ref/src/test/resources/donuts.json ./donuts.json
Or,
2. Provide the path to donuts.json inside donuts-model.json file. Edit the file and add the ‘path’ property.
Connect to JSON data model
We are done with the pre-requisites and now we can switch back to our sqlline rompt to fire some queries on out JSON Data. Below is the connect command to connect to the json data source:
sqlline> !connect jdbc:optiq:model=common/target/test-classes/donuts-model.json admin admin Connected to: Optiq (version 0.4.10) Driver: Optiq JDBC Driver (version 0.4.10) Autocommit status: true Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:optiq:model=common/target/test-classe>
Execute sample SQL queries
Lets play around with few SQL Queries on Drill:
0: jdbc:optiq:model=common/target/test-classe> select _MAP['name'] as name, _MAP['type'] as type from donuts; RunOutcome [outcome=SUCCESS, bytes=0, records=5, exception=null] +----------------+--------+ | NAME | TYPE | +----------------+--------+ | Cake | donut | | Raised | donut | | Old Fashioned | donut | | Filled | donut | | Apple Fritter | donut | +----------------+--------+ 5 rows selected (0.15 seconds)
And few more complex Queries:
SELECT ppu, count(1) AS cnt FROM (SELECT _MAP['ppu'] AS ppu, _MAP['type'] AS type FROM donuts)y GROUP BY ppu ORDER BY ppu DESC;
RunOutcome [outcome=SUCCESS, bytes=0, records=3, exception=null] +-------+------+ | PPU | CNT | +-------+------+ | 1.0 | 1 | | 0.69 | 2 | | 0.55 | 2 | +-------+------+ 3 rows selected (0.862 seconds)
NOTE: _MAP has to be used to reference fields of data for now.
Thats all for this post. Hope it was helpful. Do not forget to visit: http://incubator.apache.org/drill/index.html#get_involved. Cheers \m/
Hi,
Thx for this article. I am new to Drill, got it installed and verified that I am able to query the sample parquet file.
I am trying to query a json file now. I am not using optiq here.
I added a new json file under sample-data dir:
abishek@abishek-comp:/opt/drill/apache-drill-1.0.0-m1-incubating/sample-data$ cat test.json
{“name”:”abishek”,”msg”:”hello world”}
{“name”:”cp”,”msg”:”hello blr”}
{“name”:”mr”,”msg”:”hello us”}
I started my standalone dril like:
/opt/drill/apache-drill-1.0.0-m1-incubating$ sudo ./sqlline -u jdbc:drill:schema=json-local -n admin -p admin
I fired a query like:
SELECT * FORM “sample-data/test.json”;
However the json file is not getting detected. Looks like I have to register this source but I am not sure how. Not able to find documentation on how to do it on a embedded mode run without zk. Could you help find what am I missing here?
Hi Abishek,
The blog gives out an older version of querying JSON Data.
Please try these steps here and let me know if it worked for you:
https://www.mail-archive.com/drill-user@incubator.apache.org/msg00416.html
Yep, using it worked. Thanks!
Hi, I installed drill as per ur post…
now when I try to run drill using “./sqlline” command am getting this error
“bash: ./sqlline: No such file or directory”
is there anything leftover I need to do.?
I am just wondering how could I access Drill via JDBC from the web application? Just like accessing RDBMS.