Motivation for the post came from a friend who was having trouble finding way to use Drill with JSON Data source. So here comes a crisp post on the same.
Start Sqlline prompt
switch to the Drill root directory and start sqlline prompt by the given command:
$ cd incubator-drill $ /incubator-drill$ ./sqlline sqlline>
Provide donuts.json to donuts-model.json
Our JSON data is present in the file donuts.json, which is located in the location:
We need to provide this file to the donuts-model.json file which would be used by optiq to connect to the json data source. This can be done by any of the two ways:
1. Copy the donuts.json file to /incubator-drill/
$ cp exec/ref/src/test/resources/donuts.json ./donuts.json
2. Provide the path to donuts.json inside donuts-model.json file. Edit the file and add the ‘path’ property.
Connect to JSON data model
We are done with the pre-requisites and now we can switch back to our sqlline rompt to fire some queries on out JSON Data. Below is the connect command to connect to the json data source:
sqlline> !connect jdbc:optiq:model=common/target/test-classes/donuts-model.json admin admin Connected to: Optiq (version 0.4.10) Driver: Optiq JDBC Driver (version 0.4.10) Autocommit status: true Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:optiq:model=common/target/test-classe>
Execute sample SQL queries
Lets play around with few SQL Queries on Drill:
0: jdbc:optiq:model=common/target/test-classe> select _MAP['name'] as name, _MAP['type'] as type from donuts; RunOutcome [outcome=SUCCESS, bytes=0, records=5, exception=null] +----------------+--------+ | NAME | TYPE | +----------------+--------+ | Cake | donut | | Raised | donut | | Old Fashioned | donut | | Filled | donut | | Apple Fritter | donut | +----------------+--------+ 5 rows selected (0.15 seconds)
And few more complex Queries:
SELECT ppu, count(1) AS cnt FROM (SELECT _MAP['ppu'] AS ppu, _MAP['type'] AS type FROM donuts)y GROUP BY ppu ORDER BY ppu DESC;
RunOutcome [outcome=SUCCESS, bytes=0, records=3, exception=null] +-------+------+ | PPU | CNT | +-------+------+ | 1.0 | 1 | | 0.69 | 2 | | 0.55 | 2 | +-------+------+ 3 rows selected (0.862 seconds)
NOTE: _MAP has to be used to reference fields of data for now.
Thats all for this post. Hope it was helpful. Do not forget to visit: http://incubator.apache.org/drill/index.html#get_involved. Cheers \m/
Yash Sharma is a Big Data & Machine Learning Engineer, A newbie OpenSource contributor, Plays guitar and enjoys teaching as part time hobby.
Talk to Yash about Distributed Systems and Data platform designs.