Indexing csv data in Solr via Python – PySolr
Here is a crisp post to index Data in Solr using Python. 1. Install Pre-requisites – pip – PySolr 2. Python Script #!/usr/bin/python import sys,… Read More »Indexing csv data in Solr via Python – PySolr
Here is a crisp post to index Data in Solr using Python. 1. Install Pre-requisites – pip – PySolr 2. Python Script #!/usr/bin/python import sys,… Read More »Indexing csv data in Solr via Python – PySolr
TLDR; A Pig Logical plan is the Plan DAG that is used to execute the chain oj Jobs on Hadoop. Here is the code snippet… Read More »How to get Pig Logical plan (Execution DAG) from Pig Latin script
Adding a Quick note – PySolr : How to boost a field for Solr document Index time boosting conn.add(docs, boost={‘author’: ‘2.0’,}) Query time boosting qf=title^5 content^2… Read More »PySolr : How to boost a field for Solr document
Exception in thread “main” org.apache.solr.common.SolrException: Bad Request Bad Request request: http://54.254.192.149:8983/solr/feeddata/update?wt=javabin&version=2 Solution: Check Solr logs. INFO – 2014-11-07 07:04:42.985; org.apache.solr.update.processor.LogUpdateProcessor; [feeddata] webapp=/solr path=/update params={wt=javabin&version=2} {}… Read More »JSolr Exception – Exception in thread “main” org.apache.solr.common.SolrException: Bad Request
1. Java pojo: Add the Java POJO with the required fields- import org.apache.solr.client.solrj.beans.Field; /** * Created by yash on 18/11/14. */ public class ProductBean {… Read More »Indexing CSV data file in Solr – Using annotated java pojo’s
Another annoying Mahout Error on running the Mahout jobs. Well this is caused because of the reason already discussed. The mahout is not build explicitly… Read More »Mahout Exception : java.lang.NoSuchMethodError: org.apache.hadoop.util.ProgramDriver.driver
We will have to create a tunnel from our local dev boxes to the EMR cluster. Couple of steps for tunneling the hadoop box’s ports… Read More »Unable to view EMR web interfaces: Tunnel all cluster ports on local port
New best tutorial around. Keeping a note of it 🙂 Check it out, you might love it too. https://raseshmori.wordpress.com/2012/09/23/install-hadoop-2-0-1-yarn-nextgen/
1. Build Sbt Create a build.sbt file. This manages all dependencies and stuffs that would had been in your pom file- import AssemblyKeys._ import sbtassembly.Plugin._… Read More »Minimal Spark hello world
Here is a quick draft on accessing Apache Drill via the java JDBC. 1. Add the Drill dependency- <dependency> <groupId>org.apache.drill.exec</groupId> <artifactId>drill-jdbc</artifactId> <version>1.1.0</version> </dependency> 2. Java… Read More »Apache Drill access via Java JDBC API
Drill has got a great rest support and we can leverage the rest interface to create/update Drill Storage plugins via curl requests. Request to create… Read More »Create / Update Drill storage plugin without drill browser UI – via rest api and curl request
Note for readers – I wrote this patch a year back and it no longer works. Please treat this just as a reference code. Last… Read More »SQL on Cassandra : Querying Cassandra via Apache Drill