Friday, November 23, 2012

Solr - Indexing Data Using SolrJ and addBeans

So far it looks like indexing data using SolrJ is considerably slower than indexing data using the update handler and a local CSV file.  It took about 36 to 40 seconds to index 100000 documents using SolrServer.addBeans() compared to about 17 to 18 seconds using the update handler and a local CSV file.

The code using SolrJ, listed below, was running on the same machine as Solr.

public static void IndexBeanValues(List testRecords) 
    throws IOException, SolrServerException {

    HttpSolrServer server = new HttpSolrServer("http://localhost:8983/solr");
    server.addBeans(testRecords);
    server.commit();
}

I tried passing in an instance to SolrServer, but it didn't make any noticeable difference for timing.  It might make more of a difference instantiating a new instance of SolrServer for each batch if the Java code using SolrJ is running on a different machine than the Solr server being targeted.

Refer to this post for a more detailed code example using SolrJ and addBeans.

2 comments:

  1. hey...I want to implement similar for indexing html files.......can i get a link to download this source code.Thanks

    ReplyDelete
    Replies
    1. I updated the post above to include a link to a new post with a more detailed code example. I hope that you find it helpful.

      Delete