How to run pig latin scripts on apache drill

This is an initial work on supporting Pig scripts on Drill. It extends the PigServer to parse the Pig Latin script and to get a Pig logical plan corresponding to the pig script. It then converts the Pig logical plan to Drill logical plan. The code is not complete and supports limited number of Pig Operators like LOAD, STORE, FILTER, UNION, JOIN, DISTINCT, LIMIT etc. It serves as a starting point for the concept.

Architecture Diagram:

How to run pig latin scripts on apache drill

Code: https://github.com/yssharma/pig-on-drill
Review Board: https://reviews.apache.org/r/26769/

Operators Supported: LOAD, STORE, FILTER, UNION, JOIN, DISTINCT, LIMIT.

Future work: FOREACH and GROUP is not supported yet.

TestCases: org.apache.drill.exec.pigparser.TestPigLatinOperators.

Pig Scripts can be tested on Drill’s web interface as well (localhost:8047/query).

Fact check:

  • LOAD: Supports delimited text files only. Picks delimeter provided in PigStorage(). Default \t. Reads data from Local Filesystem currently. (pig -x local)
  • STORE: Only dumps on –SCREEN– for now.
  • JOIN: Inner, LeftOuter, RightOuter, FullOuter (not supported by drill currently though). Only supports alias based joins not index based($0 etc).

Cheers \m/

Yash Sharma is a Big Data & Machine Learning Engineer, A newbie OpenSource contributor, Plays guitar and enjoys teaching as part time hobby.
Talk to Yash about Distributed Systems and Data platform designs.

Leave a Reply

Your email address will not be published. Required fields are marked *