Sunday, November 18, 2012

Solr - Indexing documents

We're using Solr 4.0 at work, so I decided that I should spend some time messing around with the gears and levers to make sure that I really understand what I'm doing.

I made a schema that included a field called "id" and a multi-valued field called "lookupids".  I created a file that had a header row of "id<tab>lookupids", and data rows that had a guid followed by random ints separated by commas.  ie,

269d8a33-0fd6-4877-b631-dccc4146cf90<tab>11507,25964,118430,306825,315793,348797,349191

The file contained 100000 entries, and I was able to index the file using a URL like this:

http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=exampledocs/test_with_lookupids.txt

One thing that I was expecting to happen was for the results to return the lookupids as an array.  Instead the lookupids field values are returned the same way they were stored in the source file.


<result name="response" numFound="1" start="0">
<doc>
<str name="id">
e09d8f38-c1ef-4a97-a832-a4bdc0b18bc5
</str>
<str name="lookupids">
2,16481,38485,50205,101885,107642,110903,142770,174184,193689,204770,223341,225669,242335,253654,278519,284132,333735,352163,372383,377816,401338,420851,443967,500899,575204,593052,645555,667294,742558,757738,804361,826200,828540,839016,859782,875115,877853,893658,915890,945398,954502,969859,971992,989172
</str>
<long name="_version_">
1419020904549056527
</long>
</doc>
</result>


The reason I was expecting the lookupids to be returned as an array is that the lookupids field was defined as follows:

<field name="lookupids" type="commaDelimited" indexed="true" stored="true" multivalued="true"/>

<fieldType name="commaDelimited" class="solr.TextField"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern=",\s*" /> </analyzer> </fieldType>

I figured that having the field defined as multivalued, and having the commaDelimited type set to use the PatternTokenizer with a pattern that separates using the comma to identify tokens, would give the array response.

I'll update this post once I figure out how to get the results as an array.


No comments:

Post a Comment