Here is a crisp post to index Data in Solr using Python.
1. Install Pre-requisites
– pip
– PySolr
2. Python Script
#!/usr/bin/python import sys, getopt import pysolr import csv, json #SOLR_URL=http://54.254.192.149:8983/solr/feeddata/ def main(args): solrurl='' inputfile='' try: opts, args = getopt.getopt(args,"hi:u:") except getopt.GetoptError: print 'index_data.py -i -u ' sys.exit(2) for opt, arg in opts: if opt == '-h': print 'index_data.py -i -u ' sys.exit() elif opt in ("-i"): inputfile = arg elif opt in ("-u"): solrurl = arg # create a connection to a solr server s = pysolr.Solr(solrurl, timeout=10) keys=("rank", "pogid", "cat", "subcat", "question_bucketid", "brand", "discount", "age_grp", "gender", "inventory", "last_updated") record_count=0 for line in open(inputfile, 'r').readlines(): splits = line.split(',') record_count += 1 # add record for indexing items=[{"id":record_count, "rank":splits[0], "pogid":splits[1], "cat":splits[2], "subcat":splits[3], "question_bucketid":splits[4], "brand":splits[5], "discount":splits[6], "age_grp":splits[7], "gender":splits[8], "inventory":splits[9], "last_updated":splits[10]}] #s.delete(q='*:*') s.add(items, commit=True) s.commit() print 'Done !!' if __name__ == "__main__": main(sys.argv[1:])
NOTE: Indentation is a little messed up.
3. Trouble shooting:
You might face couple of error like below. Check Solr logs for Root cause and solution.
– IP Address Error
– Undefined feild error