Wednesday, December 23, 2015

Sitecore Lucene index and DateTime fields

[Sitecore 8.1]

DateTime field in Lucene index

I was trying to create an index search for an event calendar that would give me items (from a template etc..)  that have a datefield:
  • from today onwards (today included) 
  • up until today
The field is Sitecore is a date field (so no time indication), but our query seemed to have issues with the time indications. The code to create the predicate looks like this:

private Expression<Func<EventItem, bool>> GetDatePredicate(OverviewMode mode)
{
  var predicate = PredicateBuilder.True<EventItem>();
  switch (mode)
  {
 case OverviewMode.Future:
 {
  var minDate = DateTime.Today.ToUniversalTime();
  predicate = predicate.And(n => n.StartDate > minDate);
  break;
 }
 case OverviewMode.Past:
 {
  var maxDate = DateTime.Today.ToUniversalTime();
  var minDate = DateTime.MinValue.ToUniversalTime();
  predicate = predicate.And(n => n.StartDate < maxDate).And(n => n.StartDate > minDate);
  break;
 }
 default:
 {
  return null;
 }
  }
  return predicate;
}


This did not work correctly with events "today". We had to add "AddDays(-1)" after the Today before we set it to UTC. So why?

The first reason is that Sitecore stores its DateTimes in UTC which was an hour difference with our local time. So, our dates shifted a day back: "12/12/2015" becomes "12/11/2015 23:00". This is known and should be no issue as we also shift to UTC in our predicate.

But still.. we did not get the correct results.

The logs

So we look at the logs. Sitecore logs all requests in the Search log file. We saw that our predicate was translated into something like this:
"+(+date_from:[* TO 20151111t230000000z} +date_from:{00010101t000000000z TO *])"

Looks fine, but note that the "t" in the dates is lowercase. In my index however they are all uppercase. If I try the query with Luke it does give me the wrong results indeed.. When I alter the query in Luke to use uppercase T it works correctly..

Support, here we come!


Solution(s)

Support gave us 2 possible solutions, next to the one we already had (skipping a day).

1. Format

We could alter our index to use a format attribute:
<field fieldName="datefrom" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" 
format="yyyyMMdd" type="System.DateTime" 
settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>

After rebuilding our index, the "DateFrom" field values, stored in the index, will contain only dates (like "20151209"), so search by dates should return results as expected (since there are no "T" and "Z" symbols). 
This works if you really don't need the times..

2. Custom Converter

Another solution is to override the "Sitecore.ContentSearch.Converters.IndexFieldUtcDateTimeValueConverter" class to store dates in lower case to the index.

Add your converter to the index config:
<converters hint="raw:AddConverter">
  ...
  <converter handlesType="System.DateTime" 
         typeConverter="YourNamespace.LowerCaseIndexFieldUtcDateTimeValueConverter, YourAssembly" />
  ...
</converters>

As a result, all dates should be stored to the index in lower case. As the search query is in lower case, all expected results should be found.


Future solution

Since currently search queries are always generated in lower case and this behavior is currently not configurable (the "LowercaseExpandedTerms" property of the "Lucene.Net.QueryParsers.QueryParser" class is always set to true, which lowers parameters in a search query string), a feature request for the product was made so that it can be considered for future implementations. That should make these tweaks unnecessary..

No comments:

Post a Comment