Indexing csv data in Solr via Python – PySolr

Here is a crisp post to index Data in Solr using Python.

1. Install Pre-requisites

– pip
– PySolr

2. Python Script

#!/usr/bin/python

import sys, getopt
import pysolr
import csv, json

#SOLR_URL=http://54.254.192.149:8983/solr/feeddata/

def main(args):
  solrurl=''
  inputfile=''
  try:
    opts, args = getopt.getopt(args,"hi:u:")
  except getopt.GetoptError:
    print 'index_data.py -i -u '
    sys.exit(2)

  for opt, arg in opts:
    if opt == '-h':
      print 'index_data.py -i -u '
      sys.exit()
    elif opt in ("-i"):
      inputfile = arg
    elif opt in ("-u"):
      solrurl = arg

  # create a connection to a solr server
  s = pysolr.Solr(solrurl, timeout=10)
  keys=("rank", "pogid", "cat", "subcat", "question_bucketid", "brand", "discount", "age_grp", "gender", "inventory",   "last_updated")
  record_count=0
  for line in open(inputfile, 'r').readlines():
    splits = line.split(',')
    record_count += 1
    # add record for indexing
    items=[{"id":record_count, "rank":splits[0], "pogid":splits[1], "cat":splits[2], "subcat":splits[3],   "question_bucketid":splits[4], "brand":splits[5], "discount":splits[6], "age_grp":splits[7], "gender":splits[8],   "inventory":splits[9], "last_updated":splits[10]}]

  #s.delete(q='*:*')
  s.add(items, commit=True)
  s.commit()
  print 'Done !!'

if __name__ == "__main__":
  main(sys.argv[1:])

NOTE: Indentation is a little messed up.

3. Trouble shooting:

You might face couple of error like below. Check Solr logs for Root cause and solution.
– IP Address Error
– Undefined feild error

Yash Sharma is a Big Data & Machine Learning Engineer, A newbie OpenSource contributor, Plays guitar and enjoys teaching as part time hobby.
Talk to Yash about Distributed Systems and Data platform designs.

Leave a Reply

Your email address will not be published. Required fields are marked *