Here is a crisp post to index Data in Solr using Python.
1. Install Pre-requisites
– pip
– PySolr
2. Python Script
#!/usr/bin/python
import sys, getopt
import pysolr
import csv, json
#SOLR_URL=http://54.254.192.149:8983/solr/feeddata/
def main(args):
solrurl=''
inputfile=''
try:
opts, args = getopt.getopt(args,"hi:u:")
except getopt.GetoptError:
print 'index_data.py -i -u '
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
print 'index_data.py -i -u '
sys.exit()
elif opt in ("-i"):
inputfile = arg
elif opt in ("-u"):
solrurl = arg
# create a connection to a solr server
s = pysolr.Solr(solrurl, timeout=10)
keys=("rank", "pogid", "cat", "subcat", "question_bucketid", "brand", "discount", "age_grp", "gender", "inventory", "last_updated")
record_count=0
for line in open(inputfile, 'r').readlines():
splits = line.split(',')
record_count += 1
# add record for indexing
items=[{"id":record_count, "rank":splits[0], "pogid":splits[1], "cat":splits[2], "subcat":splits[3], "question_bucketid":splits[4], "brand":splits[5], "discount":splits[6], "age_grp":splits[7], "gender":splits[8], "inventory":splits[9], "last_updated":splits[10]}]
#s.delete(q='*:*')
s.add(items, commit=True)
s.commit()
print 'Done !!'
if __name__ == "__main__":
main(sys.argv[1:])
NOTE: Indentation is a little messed up.
3. Trouble shooting:
You might face couple of error like below. Check Solr logs for Root cause and solution.
– IP Address Error
– Undefined feild error