Friday, November 23, 2012

Solr - Indexing Data Using SolrJ

I think I found one of the slowest ways possible to index data into Solr.  I'm looking into various ways to index data into Solr:
  • indexing text files local to the server that Solr is running on using the update handler
  • indexing data using an app using SolrJ that is running on the same server as Solr
  • indexing data using an app using SolrJ that is on a different machine on the same network that the Solr server is on
I was able to index 100000 items of data into Solr using the update handler to process a CSV file in about 17 to 18 seconds.  Next I tried indexing the same data using SolrJ.  It took about 6 minutes!  I'm sure that the reason it took so long is the way that I wrote the method to index the data.  

The method looks like this:

    public static void IndexValues(TestRecord[] testRecords
        throws IOException, SolrServerException {

        HttpSolrServer server = new HttpSolrServer("http://localhost:8983/solr");
        for(int i = 0; i < testRecords.length; ++i) {
            SolrInputDocument doc = new SolrInputDocument();
            doc.addField("id", testRecords[i].getId());
            for (Integer value : testRecords[i].getLookupIds()) {
                doc.addField("lookupids", value);
            }
            server.add(doc);
            if(i%100==0) server.commit();  // periodically flush
        }
        server.commit();

    }

I'll have to try something similar, but using beans.  It seems like it could be a bit faster if I used the addBeans method to add multiple documents at once.

No comments:

Post a Comment