data engineering

Indexing csv data in Solr via Python – PySolr

Yash Sharma
October 11, 2015October 11, 2015
solr

Here is a crisp post to index Data in Solr using Python. 1. Install Pre-requisites – pip – PySolr 2. Python Script #!/usr/bin/python import sys,… Read More »Indexing csv data in Solr via Python – PySolr

How to get Pig Logical plan (Execution DAG) from Pig Latin script

Yash Sharma
October 11, 2015October 11, 2015
pig

TLDR; A Pig Logical plan is the Plan DAG that is used to execute the chain oj Jobs on Hadoop. Here is the code snippet… Read More »How to get Pig Logical plan (Execution DAG) from Pig Latin script

PySolr : How to boost a field for Solr document

Yash Sharma
October 11, 2015October 11, 2015
solr

Adding a Quick note – PySolr : How to boost a field for Solr document Index time boosting conn.add(docs, boost={‘author’: ‘2.0’,}) Query time boosting qf=title^5 content^2… Read More »PySolr : How to boost a field for Solr document

JSolr Exception – Exception in thread “main” org.apache.solr.common.SolrException: Bad Request

Yash Sharma
October 11, 2015October 11, 2015
solr

Exception in thread “main” org.apache.solr.common.SolrException: Bad Request Bad Request request: http://54.254.192.149:8983/solr/feeddata/update?wt=javabin&version=2 Solution: Check Solr logs. INFO – 2014-11-07 07:04:42.985; org.apache.solr.update.processor.LogUpdateProcessor; [feeddata] webapp=/solr path=/update params={wt=javabin&version=2} {}… Read More »JSolr Exception – Exception in thread “main” org.apache.solr.common.SolrException: Bad Request

Indexing CSV data file in Solr – Using annotated java pojo’s

Yash Sharma
October 11, 2015October 11, 2015
solr

1. Java pojo: Add the Java POJO with the required fields- import org.apache.solr.client.solrj.beans.Field; /** * Created by yash on 18/11/14. */ public class ProductBean {… Read More »Indexing CSV data file in Solr – Using annotated java pojo’s

Mahout Exception : java.lang.NoSuchMethodError: org.apache.hadoop.util.ProgramDriver.driver

Yash Sharma
October 11, 2015October 11, 2015
hadoop, machine learning

Another annoying Mahout Error on running the Mahout jobs. Well this is caused because of the reason already discussed. The mahout is not build explicitly… Read More »Mahout Exception : java.lang.NoSuchMethodError: org.apache.hadoop.util.ProgramDriver.driver

Unable to view EMR web interfaces: Tunnel all cluster ports on local port

Yash Sharma
October 11, 2015October 28, 2018
hadoop, random

We will have to create a tunnel from our local dev boxes to the EMR cluster. Couple of steps for tunneling the hadoop box’s ports… Read More »Unable to view EMR web interfaces: Tunnel all cluster ports on local port

Minimal Hadoop and Yarn installation

Yash Sharma
October 11, 2015October 11, 2015
hadoop

New best tutorial around. Keeping a note of it 🙂 Check it out, you might love it too. https://raseshmori.wordpress.com/2012/09/23/install-hadoop-2-0-1-yarn-nextgen/

Minimal Spark hello world

Yash Sharma
October 11, 2015
spark

1. Build Sbt Create a build.sbt file. This manages all dependencies and stuffs that would had been in your pom file- import AssemblyKeys._ import sbtassembly.Plugin._… Read More »Minimal Spark hello world

Apache Drill access via Java JDBC API

Yash Sharma
October 11, 2015October 11, 2015
apache drill

Here is a quick draft on accessing Apache Drill via the java JDBC. 1. Add the Drill dependency- <dependency> <groupId>org.apache.drill.exec</groupId> <artifactId>drill-jdbc</artifactId> <version>1.1.0</version> </dependency> 2. Java… Read More »Apache Drill access via Java JDBC API

Create / Update Drill storage plugin without drill browser UI – via rest api and curl request

Yash Sharma
February 24, 2015
apache drill

Drill has got a great rest support and we can leverage the rest interface to create/update Drill Storage plugins via curl requests. Request to create… Read More »Create / Update Drill storage plugin without drill browser UI – via rest api and curl request

SQL on Cassandra : Querying Cassandra via Apache Drill

Yash Sharma
January 18, 2015October 15, 2016
6 Comments
apache drill

Note for readers – I wrote this patch a year back and it no longer works. Please treat this just as a reference code. Last… Read More »SQL on Cassandra : Querying Cassandra via Apache Drill