Monday, December 7, 2015

Sitecore Lucene index with integers

The situation

We recently discovered an issue when using a facet on an integer field in a Sitecore (8.1) Lucene index. We had a number of articles (items) with a date field. We had to query these items, order them by date and determine the number of items in each year.

The code

We created a ComputedField "year" and filled it with the year part of the date:
var dateTime = ((DateField)publicationDateField).DateTime;
return dateTime.Year;
We added the field to a custom index, and created an entry in the fieldmap to mark it as System.Int32. We rebuild the index, check the contents with Luke and all is fine. So we create a class based on SearchResultItem to use for the query:

class NewsItem : SearchResultItem
{
    [IndexField("title")]
    public string Title { get; set; }

    [IndexField("publication date")]
    public DateTime Date { get; set; }

    [IndexField("category")]
    public Guid Category { get; set; }

    [IndexField("year")]
    public int PublicationYear { get; set; }
}

The query

When we use this class for querying, we get not results when filtering on the year.. apparently integer fields need to be tokenized to be used in searches (indexType="TOKENIZED"). Sounds weird as this is surely not true for text fields, but the NumericField constructor makes it clear:

Lucene.Net.Documents.NumericField.NumericField(string name, int precisionStep, Field.Store store, bool index) : base(name, store, index ? Field.Index.ANALYZED_NO_NORMS : Field.Index.NO, Field.TermVector.NO)

So, we changed the field in the fieldmap and set it tokenized. We add an analyzer to prevent the integer being cut in parts (Lucene.Net.Analysis.KeywordAnalyzer or Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer).

Success?

Yeah! We have results! We got the news items for 2015! And 2014..  But... there is always a but or this post would be too easy. We still needed a facet. And there it went wrong. The facet resulted in this:


Not what we expected actually...

So back to our query and index..  Sitecore Support found out that this happens because of the specific way the numeric fields are indexed by Lucene, they are indexed not just as simple tokens but as a tree structure (http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/document/NumericField.html).

Unfortunately, Sitecore cannot do faceting on such fields at this moment - this is now logged as a bug.

The Solution

The solution was actually very simple. We threw out the field from the fieldmap and changed the int in our NewsItem to string. If we want to use them as an integer we need to cast them afterwards, but for now we don't even need that.
Luckily for us, even the sorting doesn't care as our int's are years. So we were set.. queries are working and facets are fine.

No comments:

Post a Comment