SQL on Cassandra : Querying Cassandra via Apache Drill

Note for readers – I wrote this patch a year back and it no longer works. Please treat this just as a reference code.
Last Tested with:
Drill 1.2.0
Cassandra 2.2.0

5e8

In this crisp post I would be talking about Drill’s Cassandra Storage plugin which would enable us to query Cassandra via Apache Drill. That also means that we would be able to issue ANSI SQL queries on Cassandra which is not inherently supported on Cassandra.

All the code :
https://github.com/yssharma/drill/tree/cassandra-storage
Patch: https://gist.github.com/yssharma/2581ae8a97c559b2677f

There are couple of steps we would need to setup Cassandra storage before we can start playing with Cassandra and Drill. Download the patch and save in file. (Here:DRILL-92-CassandraStorage.patch)

1. Get Drill: Lets get the Drill source

$> git clone https://github.com/apache/drill.git

2. Get Cassandra Storage patch:

Download the Patch file from

https://reviews.apache.org/r/29816/diff/raw/

3. Apply the patch on top of Drill

$> cd drill
$> git apply --check ~/Downloads/DRILL-92-CassandraStorage.patch
$> git apply ~/Downloads/DRILL-92-CassandraStorage.patch

4. Build Drill with Cassandra Storage & export distribution to /opt/drill

$> mvn clean install -DskipTests
$> mkdir /opt/drill
$> tar xvzf distribution/target/*.tar.gz --strip=1 -C /opt/drill

5. Start Sqlline.

That it we have finished with the Drill build and installation – and its time we can start using Drill.

$> cd /opt/drill
$> bin/sqlline -u jdbc:drill:zk=local -n admin -p admin

Drill-Sqlline

Hit ‘show schemas‘ to view existing schemas.

Drill-Sqlline-schemas

6. Drill Web interface

No we should be able to see the Drill web interface on localhost:8047.

Drill-Webinterface

7. Configure Cassandra Plugin :

Now its time we configure our Cassandra with Drill.

Go to the Storage page from top navigation bar & add a new plugin by name ‘cassandra’. On the next page provide the details of your Cassandra installation.

Drill-Cassandra-Storage

 

Here is the config I used:

New Storage plugin format: 
{
  "type": "cassandra",
  "config": {
    "cassandra.hosts": [
      "127.0.0.1",
      "127.0.0.2"
    ],
    "cassandra.port": 9042
  },
  "enabled": true
}

Thats it. Enough work. Its playtime now.

8. Query Cassandra.

Its time we can start querying Cassandra via Drill.

Go to the Query page from top navigation menu and Fire your Sql query on existing Cassandra tables.

Note: Make sure Cassandra is up and running.
The general query format would be like- 
SELECT * FROM cassandra.<keyspace_name>.<table_name> LIMIT 10;
Stop Sqlline and Restart if required.

Cassandra-storage-query

Cassandra-storage-query-result

Cool. Try some complex SQL now. Play Around.

We can also explore the existing Schemas via Sqlline:

Cassandra-Sqlline

Cassandra-Query

Thats all for this post. Hope it was helpful.

See ya all soon. Cheers \m/

Yash Sharma is a Big Data & Machine Learning Engineer, A newbie OpenSource contributor, Plays guitar and enjoys teaching as part time hobby.
Talk to Yash about Distributed Systems and Data platform designs.

6 thoughts on “SQL on Cassandra : Querying Cassandra via Apache Drill

  1. Hi,
    We have the urgent requirement to querying the cassandra on top of drill. For that purpose I’m following as mentioned above but when running
    “git apply –check ~/Downloads/DRILL-92-CassandraStorage.patch” I’m getting error as below,
    “fatal: corrupt patch at line 3308”.
    Please help me out in achieving the cassandra querying through drill. Your help would be greatly appreciated.

    Thanks in advance.

  2. Hello,

    I tried to post the cassandra configuration as you did as:

    {
    “type”: “cassandra”,
    “config”: {
    “cassandra.hosts”: [
    “127.0.0.1”,
    “127.0.0.2”
    ],
    “cassandra.port”: 9042
    },
    “enabled”: true
    }

    I also tried to change the cassandra.hosts as node9 (we name the server) or IP (192.168.168.29). But each time I click create, it always says
    please retry: error (invalid JSON mapping)

    Would you mind to point at what is the problem?

    thanks

  3. I have another question about applying patch to drill

    Here is what I got by applying patch:

    [root@node9 drill]# git apply –check /opt/DRILL-92-CassandraStorage.patch
    error: patch failed: contrib/pom.xml:37
    error: contrib/pom.xml: patch does not apply
    error: patch failed: distribution/pom.xml:160
    error: distribution/pom.xml: patch does not apply
    error: patch failed: distribution/src/assemble/bin.xml:92
    error: distribution/src/assemble/bin.xml: patch does not apply

    Please kindly point out what is the solution for above errors.

    thanks

    1. I’m having the exact same problem and I would like some help. Did this ever get resolved for you?

      thanks

  4. we could not compile it.
    mvn clean install

    [INFO] Scanning for projects…

    [ERROR] The build could not read 1 project -> [Help 1]

    [ERROR]

    [ERROR] The project org.apache.drill.contrib:drill-storage-cassandra:0.9.0-SNAPSHOT (/root/drill/contrib/storage-cassandra/pom.xml) has 1 error

    [ERROR] Non-resolvable parent POM: Could not find artifact org.apache.drill.contrib:drill-contrib-parent:pom:0.9.0-SNAPSHOT and ‘parent.relativePath’ points at wrong local POM @ line 22, column 13 -> [Help 2]

Leave a Reply to Alec Li Cancel reply

Your email address will not be published. Required fields are marked *