Showing posts with label Index. Show all posts
Showing posts with label Index. Show all posts

Tuesday, April 23, 2019

Unit testing Sitecore index queries in 9.1

Unit testing Sitecore index queries

I will make a few assumptions in this post:
  • You know what unit testing is
  • You understand the benefits of unit testing
  • You are using Sitecore indexes in your code 
  • and so you would like to unit test that code 
If you don't understand the benefits yet.. consider this:
You have some code that needs to be tested with a number of variables (date in the past or in the future, positive or negative numbers ...). In order to test this properly you might need to create several items in Sitecore, run several manual tests and best of all - restart your Sitecore environment after every deploy with a bugfix if your code wasn't perfect from the start (and be honest, is that ever...)

Unit testing queries in 9.1

I have used code based on a solution by Vivian Roberts for a lot of projects to test code with queries on indexes. It works fine (as long as I don't have facets and such) and has saved me a bit of time already.

And then came Sitecore 9.1. And it didn't work anymore..  Sitecore introduced a new internal interface that is checked in the QueryableExtensions and that makes our test code fail:
if (!QueryableExtensions.IsIContentSearchQueryable<TSource>(source))
    return (SearchResults<TSource>) null;
The solution apparently was to inherit from Sitecore.ContentSearch.Linq.Parsing.GenericQueryable which is public and inherits the mentioned interface.  Thanks to my colleague Alex Dhaenens and Sitecore Support for helping us out on this one.

Let's recap with a complete overview of the code I'm using to test queries.

The code

Step 1: make the code testable

To make it all testable I create a SearchContextBuilder:
public class SearchContextBuilder : ISearchContextBuilder
{
  public virtual IProviderSearchContext GetSearchContext(string index)
  {
    return ContentSearchManager.GetIndex(index).CreateSearchContext();
  }
}
This builder will be the default that is injected in all our repositories that query indexes. In the test version however, we will be able to inject another one.

Generic Search repository

You could use the context builder in a generic search repository, something like:
public class SearchRepository : ISearchRepository
{
  private readonly ISearchContextBuilder searchContextBuilder;

  public SearchRepository(ISearchContextBuilder searchContextBuilder)
  {
    this.searchContextBuilder = searchContextBuilder;
  }

  public virtual SearchResults<T> GetResults<T>(Expression<Func<T, bool>> predicate, string index) where T : SearchResultItem
  {
    using (var context = searchContextBuilder.GetSearchContext(index))
    {
      var query = context.GetQueryable<T>().Where(predicate);
      return query.GetResults();
    }
  }

  public virtual IEnumerable<T> GetResultItems<T>(Expression<Func<T, bool>> predicate, string index) where T : SearchResultItem
  {
    var results = GetResults(predicate, index);
    foreach (var hit in results.Hits)
    {
      yield return hit.Document;
    }
  }
}

Step 2: create a custom QueryableCollection

This is the part where the code from Vivian Roberts is used. We did add the item count to make the result count work and adapted it to work with Sitecore 9.1. The QueryableCollection will be used in the test version of the search context.
public class SearchProviderQueryableCollection<TElement> : GenericQueryable<TElement,Query>, IOrderedQueryable<TElement>, IQueryProvider
{
  private readonly EnumerableQuery<TElement> innerQueryable;

  public SearchProviderQueryableCollection(IEnumerable<TElement> enumerable):base(null,null,null,null,null,null,null)
  {
    innerQueryable = new EnumerableQuery<TElement>(enumerable);
  }

  public SearchProviderQueryableCollection(Expression expression) : base(null, null, null, null, null, null, null)
  {
    innerQueryable = new EnumerableQuery<TElement>(expression);
  }

  public new Type ElementType => ((IQueryable)innerQueryable).ElementType;
  public new Expression Expression => ((IQueryable)innerQueryable).Expression;
  public new IQueryProvider Provider => this;

  public new IEnumerator<TElement> GetEnumerator()
  {
    return ((IEnumerable<TElement>)innerQueryable).GetEnumerator();
  }

  IEnumerator IEnumerable.GetEnumerator()
  {
    return GetEnumerator();
  }

  public IQueryable CreateQuery(Expression expression)
  {
    return new SearchProviderQueryableCollection<TElement>((IEnumerable<TElement>)((IQueryProvider)innerQueryable).CreateQuery(expression));
  }

  public new IQueryable<TElement1> CreateQuery<TElement1>(Expression expression)
  {
    return (IQueryable<TElement1>)new SearchProviderQueryableCollection<TElement>((IEnumerable<TElement>)((IQueryProvider)innerQueryable).CreateQuery(expression));
  }

  public object Execute(Expression expression)
  {
    throw new NotImplementedException();
  }

  public new TResult Execute<TResult>(Expression expression)
  {
    var items = this.ToList();
    object results = new SearchResults<TElement>(items.Select(s => new SearchHit<TElement>(0, s)), items.Count);
    return (TResult)results;
  }
}

Step 3: create our custom testable search builder

The testable search builder will be different for each type of tests you need as it will have the actual data to test against. Important is however the creation the context itself. Note that I am using Moq as mocking framework - you can use another if you want. An example could be:
public class TestableSearchBuilder : ISearchContextBuilder
{
  private readonly IList<SearchResultItem> items;

  public TestableSearchBuilder()
  {
    var templateId1 = new ID("{04fd3a5b-af21-49e3-9d88-25355301ab91}");
    var root = new ID("{0cbba84d-f2cd-4adb-912e-36d97cb22fe9}");
    var rootPath = new List<ID> { root };
    
    items = new List<SearchResultItem>
    {
      new SearchResultItem { Name = "Item1", Language = "en", ItemId = new ID(new Guid()), TemplateId = templateId1, Paths = rootPath },
      new SearchResultItem { Name = "Item2", Language = "nl", ItemId = new ID(new Guid()), TemplateId = templateId1, Paths = rootPath }
    };
  }

  public IProviderSearchContext GetSearchContext(string index)
  {
    // create the mock context
    var searchContext = new Mock<IProviderSearchContext>();
    var queryable = new SearchProviderQueryableCollection<SearchResultItem>(items);
    searchContext.Setup(x => x.GetQueryable<SearchResultItem>()).Returns(queryable);
    searchContext.Setup(x => x.GetQueryable<SearchResultItem>(It.IsAny<IExecutionContext>())).Returns(queryable);
    return searchContext.Object;
  }
}
Note that we are using our custom SearchProviderQueryableCollection in the context. The constructor is used to add data to the list of items that are in our fake index. You can put as many items in there as you need with all the available properties you need. For the example I used the SearchResultItem, but if you inherited from that class you can use yours here as well.

Step 4: start testing

We have everything in place now to start writing actual tests. I'm using Xunit for the tests as this one works fine with FakeDB, which I am sometimes using in combination with this code.

In the constructor of the test class (or setup method if not using Xunit) you need to create an instance of the TestableSearchBuilder. You need to get this instance in the class you are testing - see step 1 to make the code testable. So create an instance of the testable class (e.g. the SearchRepository) with the TestableSearchBuilder (new SearchRepository(TestableSearchBuilder)). Depending on your actual solution, you might need to tweak the IOC container as well to use these instances when needed in the tests.

And that's all..  you should be able to unit test queries and verify the results. 


Tuesday, January 9, 2018

Custom Sitecore DocumentOptions with Solr

Almost 2 years ago I wrote a post about using custom indexes in a Helix environment. That post is still accurate, but the code was based on Lucene. As we are now all moving towards using Solr with our (non-PAAS) Sitecore setups, I though it might be a good idea to bring this topic back on the table with a Solr example this time.

(custom) indexes

I am assuming that you know about Helix, and about custom indexes. If you ever created a custom index you probably have used the documentOptions configuration section - maybe without noticing. It is used to include and/or exclude fields and templates and define computed fields. So you probably used it :)

And it wouldn't be Sitecore if we couldn't customize this...

Our own documentOptions

Why? Because we can. No..  we might have a good reason, like making our custom index definitions (more) Helix compliant. Normally your feature will not have a clue about "page" templates. But what if you want to define the include templates in your index? Those could be page templates.. or at least templates that inherit from your feature template. That is why I build my own documentOptions - to include a way to include templates derived from template-X.

Configuration

So the idea now is to create a custom document options class by inheriting from the SolrDocumentBuilderOptions. We add a new method to allow adding templates in a new section with included base templates. This will not break any other existing configuration sections.

An example config looks like:
<documentOptions type="YourNamespace.TestOptions, YourAssembly">
    <indexAllFields>true</indexAllFields>
    <include hint="list:AddIncludedBaseTemplate">
        <BaseTemplate1>{B6FADEA4-61EE-435F-A9EF-B6C9C3B9CB2E}</BaseTemplate1>
    </include>
</documentOptions>
This looks very familiar - as intended. We create a new include section with the hint "list:AddIncludedBaseTemplate". The name 'AddIncludedBaseTemplate' will come back later in our code.

Code

AddIncludedBaseTemplate

public virtual void AddIncludedBaseTemplate(string templateId)
{
  Assert.ArgumentNotNull(templateId, "templateId");
  ID id;
  Assert.IsTrue(ID.TryParse(templateId, out id), "Configuration: AddIncludedBaseTemplate entry is not a valid GUID. Template ID Value: " + templateId);
  foreach (var linkedId in GetLinkedTemplates(id))
  {
    AddTemplateFilter(linkedId, true);
  }
}
To see the rest of the code, I refer to the original post as nothing has to be changed to that in order to make it work on Solr (instead of Lucene).

Conclusion

To change the code from the Lucene example to a Solr one, we just had to change the base class to SolrDocumentBuilderOptions. 
We are now again able to configure our index to only use templates that inherit from our base templates. Still cool. And remember you can easily re-use this logic to create other document options to tweak your index behavior. 

Wednesday, October 5, 2016

Custom Sitecore index crawler

Why? - The case

We needed to find items in the master database quickly, based on some criteria. Language didn't matter and we didn't need the actual item, a few properties would do. So we decided to create a custom index with the fields we needed indexed (the criteria) and/or stored (the properties). All fine, but as we were using the master database we had different versions of each item in the index. As we needed only the latest version in only one language we though we could optimize the index to only contain those versions.

First attempt

We had to override the SitecoreItemCrawler. That was clear. The first attempt was creating a custom IsExcludedFromIndex function that would stop all entries not in English and not the latest version. Pretty simple, but does not work.
First of all, all entries were in English.. this function is only called per item and not per version. So actually, we could not use this. Furthermore, we did not take into account the fact that when adding a new version, we have to remove the previously indexed one.

Don't index multiple versions

I started searching the internet and found this post by Martin English on (not) indexing multiple versions. Great post, and pointing as well towards a solution with inbound filters. But as those filters work for every index, that would be no solution here. I needed it configurable per index. So back to Martins' post. We had to overwrite DoAdd and DoUpdate.

A custom index crawler

The result was a bit different as I was using Sitecore 8.1 and also wanted to include a language filter. I checked the code from the original SitecoreItemCrawler, created a class overwriting it and adapted where needed.

Language

I made the language configurable by putting it into a property:
private string indexLanguage;

public string Language
{
  get  {  return !string.IsNullOrEmpty(indexLanguage) ? indexLanguage : null; }
  set  {  indexLanguage = value; }
}

DoAdd

The DoAdd method was changed by adding an early check in the language-loop to get out when not the requested language. I also removed the version-loop with a request for the latest version so that only that version gets send to the index.
protected override void DoAdd(IProviderUpdateContext context, SitecoreIndexableItem indexable)
{
  Assert.ArgumentNotNull(context, "context");
  Assert.ArgumentNotNull(indexable, "indexable");
  using (new LanguageFallbackItemSwitcher(context.Index.EnableItemLanguageFallback))
  {
    Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:adding", context.Index.Name, indexable.UniqueId, indexable.AbsolutePath);
    if (!IsExcludedFromIndex(indexable, false))
    {
      foreach (var language in indexable.Item.Languages)
      {
        // only include English
        if (!language.Name.Equals(indexLanguage, StringComparison.OrdinalIgnoreCase))
        {
          continue;
        }

        Item item;
        using (new WriteCachesDisabler())
        {
          item = indexable.Item.Database.GetItem(indexable.Item.ID, language, Version.Latest);
        }

        if (item == null)
        {
          CrawlingLog.Log.Warn(string.Format(CultureInfo.InvariantCulture, "SitecoreItemCrawler : AddItem : Could not build document data {0} - Latest version could not be found. Skipping.", indexable.Item.Uri));
        }
        else
        {
          SitecoreIndexableItem sitecoreIndexableItem;
          using (new WriteCachesDisabler())
          {
            // only latest version
            sitecoreIndexableItem = item.Versions.GetLatestVersion();
          }

          if (sitecoreIndexableItem != null)
          {
            IIndexableBuiltinFields indexableBuiltinFields = sitecoreIndexableItem;
            indexableBuiltinFields.IsLatestVersion = indexableBuiltinFields.Version == item.Version.Number;
            sitecoreIndexableItem.IndexFieldStorageValueFormatter = context.Index.Configuration.IndexFieldStorageValueFormatter;
            Operations.Add(sitecoreIndexableItem, context, index.Configuration);
          }
        }
      }
    }

    Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:added", context.Index.Name, indexable.UniqueId, indexable.AbsolutePath);
  }
}


DoUpdate

For the DoUpdate method I did something similar although I had to change a bit more here.
protected override void DoUpdate(IProviderUpdateContext context, SitecoreIndexableItem indexable, IndexEntryOperationContext operationContext)
{
  Assert.ArgumentNotNull(context, "context");
  Assert.ArgumentNotNull(indexable, "indexable");
  using (new LanguageFallbackItemSwitcher(Index.EnableItemLanguageFallback))
  {
    if (IndexUpdateNeedDelete(indexable))
    {
      Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:deleteitem", index.Name, indexable.UniqueId, indexable.AbsolutePath);
      Operations.Delete(indexable, context);
    }
    else
    {
      Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:updatingitem", index.Name, indexable.UniqueId, indexable.AbsolutePath);
      if (!IsExcludedFromIndex(indexable, true))
      {
        if (operationContext != null && !operationContext.NeedUpdateAllLanguages)
 {
   if (!indexable.Item.Language.Name.Equals(indexLanguage, StringComparison.OrdinalIgnoreCase))
   {
     CrawlingLog.Log.Debug(string.Format(CultureInfo.InvariantCulture, "SitecoreItemCrawler : Update : English not requested {0}. Skipping.", indexable.Item.Uri));
            return;
   }
        }
     
 Item item;
 var languageItem = LanguageManager.GetLanguage(indexLanguage);
 using (new WriteCachesDisabler())
 {
   item = indexable.Item.Database.GetItem(indexable.Item.ID, languageItem, Version.Latest);
 }

 if (item == null)
 {
    CrawlingLog.Log.Warn(string.Format(CultureInfo.InvariantCulture, "SitecoreItemCrawler : Update : Latest version not found for item {0}. Skipping.", indexable.Item.Uri));
 }
 else
 {
   Item[] versions;
   using (new SitecoreCachesDisabler())
   {
     versions = item.Versions.GetVersions(false);
   }

   foreach (var version in versions)
   {
     if (version.Version.Equals(item.Version))
     {
       UpdateItemVersion(context, version, operationContext);
     }
     else  
     {
       Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:deleteitem", index.Name, indexable.UniqueId, indexable.AbsolutePath);
       Delete(context, ((SitecoreIndexableItem)version).UniqueId);
     }
   }
 }
    
 Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:updateditem", index.Name, indexable.UniqueId, indexable.AbsolutePath);
      }


      if (!DocumentOptions.ProcessDependencies)
      {
        return;
      }

      if (indexable.Item.Language.Name.Equals(indexLanguage, StringComparison.OrdinalIgnoreCase))
      {
        Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:updatedependents", index.Name, indexable.UniqueId, indexable.AbsolutePath);
 UpdateDependents(context, indexable);
      }
    }
  }
}

I did a few things here:
  • if the operationContext is not asking to update all languages, I check the language and get it out if it is not the index language
  • I get all versions, loop trough them and update the latest - other versions get a delete instruction
    • not sure if this is really needed as it might be sufficient to delete only the previous one
  • the call to update the dependent items was put in a language condition so that it was only executed when the requested language is the index language

Testing

And I started testing. Rebuild. Add versions. Update items. Constantly using Luke to investigate the index. It all seemed to work. 
Until I tried to add a new version in a language that was not supposed to be in the index. The new version was not send to the index, but it's previous version was. I tried to figure out what was happening and by following the flow through the existing SitecoreItemCrawler I found some options in the "IndexEntryOperationContext" that were used in the base Update function.

Update

So we also override the Update method:
public override void Update(IProviderUpdateContext context, IIndexableUniqueId indexableUniqueId, IndexEntryOperationContext operationContext, IndexingOptions indexingOptions = IndexingOptions.Default)
{
  operationContext.NeedUpdatePreviousVersion = false;
  base.Update(context, indexableUniqueId, operationContext, indexingOptions);
}

What I'm doing here is actually quite simple: I tell the crawler that he does not need to update previous versions, no matter what. As I am already updating all versions in the DoUpdate this seemed ok to do. By doing this, the problem was fixed and I did not had to copy too much code anymore.

Conclusion

The custom crawler works and does what it is supposed to do. It would have been nice though if the functions in the crawler provided by Sitecore were cut into smaller pieces to make it easier to override the pieces we want to change. I remember reading somewhere that Pavel Veller already managed to get this on a roadmap, so I hope that is true...

But for now, this worked for me. Glad to hear any remarks, suggestions, ...

Thursday, June 16, 2016

Sitecore Index dependencies

I recently stumbled upon a question on how to trigger re-indexing of related content in a Sitecore (Lucene) index. Different answers were given and I got the feeling that not everyone already knows about the getDependencies pipeline. So we write a blog post...

Re-index related content

As I mentioned, there are other solutions that could do the trick. 
  • Custom update strategy

    You could write your own update strategy and include your dependency logic in there. This approach has the benefit that you can use it in one index only without affecting others.
  • Custom save handler

    With a custom save handler you could detect save actions, get the dependent items and register them as well for index updating. I'm not convinced that this will work in all update strategy scenario's but if you have working code, feel free to share ;)
These are probably also valid solutions, but I'll leave those to others as I want to show the Sitecore pipeline that looks like the ideal candidate for the job.

getDependencies pipeline

There is a pipeline.. there always is. One drawback I'll mention already is that the pipeline is for all indexes and so far I have not found a way to trigger it for one index only (see update below on disabling). I also tried to get the index (name or anything) in the code but that didn't work out either. We could get the name of the job, but that was only relevant for the first batch of items - after that, multiple jobs were started and the name became meaningless. 

Anyway, the pipeline. In the Sitecore.ContentSearch.config you'll find this:
<!-- INDEXING GET DEPENDENCIES
  This pipeline fetches dependant items when one item is being index. Useful for fetching related or connected items that also
  need to be updated in the indexes.
  Arguments: (IQueryable) Open session to the search index, (Item) The item being indexed.
  Examples: Update clone references.
  Update the data sources that are used in the presentation components for the item being indexed.
-->

<indexing.getDependencies help="Processors should derive from Sitecore.ContentSearch.Pipelines.GetDependencies.BaseProcessor">
  <!-- When indexing an item, make sure its clones get re-indexed as well -->
  <!--<processor type="Sitecore.ContentSearch.Pipelines.GetDependencies.GetCloningDependencies, Sitecore.ContentSearch"/>-->
  <!-- When indexing an item, make sure its datasources that are used in the presentation details gets re-indexed as well -->
  <!--<processor type="Sitecore.ContentSearch.Pipelines.GetDependencies.GetDatasourceDependencies, Sitecore.ContentSearch"/>-->
</indexing.getDependencies>

As you can see, some processors are in the box, but in comments. You can simply enable them if you want your clones and/or datasources to be indexed with the main items.

And you can write your own processor of course. An example:
public class GetPageDependencies : Sitecore.ContentSearch.Pipelines.GetDependencies.BaseProcessor
{
    public override void Process(GetDependenciesArgs context)
    {
        Assert.IsNotNull(context.IndexedItem, "indexed item");
        Assert.IsNotNull(context.Dependencies, "dependencies");
            
        var scIndexable = context.IndexedItem as SitecoreIndexableItem;
        if (scIndexable == null) return;
            
        var item = scIndexable.Item;
        if (item == null) return;
            
        // optimization to reduce indexing time by skipping this logic for items not in the Web database
        if (!string.Equals(item.Database.Name, "web", StringComparison.OrdinalIgnoreCase)) return;
            
        if (!item.Paths.IsContentItem) return;
            
        if (item.Name.Equals("__Standard Values", StringComparison.OrdinalIgnoreCase)) return;
            
        if (Sitecore.Context.Job == null) return;
            
        // logic here - example = get first child
        if (!item.HasChildren) return;
            
        var dependency = item.Children[0];
        var id = (SitecoreItemUniqueId)dependency.Uri;
        if (!context.Dependencies.Contains(id))
        {
            context.Dependencies.Add(id);
        }
    }
}

In the example here we keep it simple and just add the first child (if any). That logic can contain anything though.

As you can see we try to get out of the processor as fast as possible. You can add even more checks based on template and so on. Getting out fast if you don't want the dependencies is important!

The benefit of the solution is that the pipeline is executed when the indexing starts but before the list of items to index is finalized - which is the best moment for this task. All "extra" items are added to the original list so they are executed (indexed) by the same job and we let the Sitecore handle them they way it was meant.

Performance might not seem an issue, but when having quite some items and dependencies, and these get updated frequently it will be. You might be triggering way too much items towards the index, so be careful (no matter what solution you go for). The indexing is be a background job but if it goes berserk you will notice.
Note that it is a good thing that your dependencies don't have to go through all kind of processes before being added, they are just "added to the list".

I found this pipeline solution very useful in scenario's where the amount of dependent items that actually got added was not too big. Don't forget you can also disable the pipeline processor temporarily (and perform a rebuild) if needed.

How to Enable/Disable 

(from the Sitecore Search and Indexing on SDN) - thx jammykam for the info

The pipeline is executed from within each crawler if the crawler’s ProcessDependencies property is set to true, which is the default. To disable this feature, add the following parameter to the appropriate index under the <Configuration /> section.
<index id="content" ...>
 ...
 <Configuration type="...">
...
 <ProcessDependencies>false</ProcessDependencies>
Alternatively, if the indexes don’t override default configuration with a local one, you can also globally change this setting in the DefaultIndexConfiguration.

Known issues with the indexing.getDependencies pipeline

https://kb.sitecore.net/articles/116076

Tuesday, March 22, 2016

Custom index update strategy

ComputedIndexField with dependencies to other items

Ever had a ComputedIndexField that gathered data from other items than the item currently being indexed? For example, from its children. Or from items referred to..  I just had a situation where we needed (a property of) the children to be included in our ComputedIndexField.
But what happens if you update a child item? The child is re-indexed but the parent is not as this was not changed, not published, ...  We were using the onPublishEndAsync update strategy and didn't want to have a solution that needed a rebuild periodically just to keep the index up to date.

There is a GetDependencies pipeline that can be used to add extra items to the indexable list, but that was no option as this pipeline is for all indexes and we wanted it just for our custom index and preferably configurable per index (thinking about performance as well..)

Extend the onPublishEnd update strategy

So, we started thinking of extending the update strategy. We found some examples on the internet but not in our Sitecore version (we are using Sitecore 8.1) and the examples didn't go far enough.

What we wanted was:
  • the base onPublishEnd strategy (that would still check for rebuilds and so on)
  • an extension that would add the first ascendant of the item of a defined template to the list of indexable items

I had a look at the code of the OnPublishEndAsynchronousStrategy with DotPeek and noticed that this was extendable indeed. 

Let's start by creating our class by doing what a developer is good at: copy/paste :)

[DataContract]
public class OnPublishEndWithAncestorAsynchronousStrategy : OnPublishEndAsynchronousStrategy
{
    public string ParentTemplateId { get; set; }
    public string ChildTemplateId { get; set; }

    public OnPublishEndWithAncestorAsynchronousStrategy(string database) : base(database)
    {
    }
}

We created a class that extends OnPublishEndAsynchronousStrategy and gave it a constructor that needs the database name (which will be passed in the config). We also defined two variables to identify the templates that are affected - both parent (ancestor item to look for) as child (item to start from).

Performance

The child item template(s) are requested because our strategy code is executed before the crawler's root path is checked and before the 'DocumentOptions' are checked (like 'IncludeTemplates'). As this extended strategy is already heavier than the original one we wanted to prevent getting even more performance hits for items we don't need to check. This will become clear later on...

Configuration


<strategies hint="list:AddStrategy">
  <onPublishEndWithAncestorAsync type="Xx.OnPublishEndWithAncestorAsynchronousStrategy, Xx">
    <param desc="database">web</param>
    <ParentTemplateId>{0F5141D6-F264-4D03-B5D2-3505E6F308E7}</ParentTemplateId>
    <ChildTemplateId>{2A993FF2-5F17-4EEA-AD53-5343794F86BB}{066DEA00-31D7-4838-94A6-8D05A7FC690E}</ChildTemplateId>
  </onPublishEndWithAncestorAsync>
</strategies>

In the strategies section where you normally add your strategies by pointing towards the one(s) defined in the default Sitecore index configurations we define our custom strategy by providing the type. We send the database (web) as parameter and define the guids for the templates. In this example code we can send multiple child templates.

The index run

After some investigation, it turned out we only had to override one method: "Run". 
We started by copy/pasting the original code and checked the extension points:
  • if the item queue is empty: we leave the original code 
  • if the item queue is so big a rebuild is suggested: we keep the original code as a rebuild will also update the ancestor we might add
  • else..
We kept the original code to fetch the list of items to refresh. We don't actually get the items as "Item" but as "IndexableInfo" objects. For each entry in this list we call our GetAncestor function. The result is checked for null and added to the original list only if is wasn't already in there.


protected override void Run(List<QueuedEvent> queue, ISearchIndex index)
{
    CrawlingLog.Log.Debug($"[Index={index.Name}] {GetType().Name} executing.");
    if (Database == null)
    {
        CrawlingLog.Log.Fatal($"[Index={index.Name}] OperationMonitor has invalid parameters. Index Update cancelled.");
    }
    else
    {
        queue = queue.Where(q => q.Timestamp > (index.Summary.LastUpdatedTimestamp ?? 0L)).ToList();
        if (queue.Count <= 0)
        {
            CrawlingLog.Log.Debug($"[Index={index.Name}] Event Queue is empty. Incremental update returns");
        }
        else if (CheckForThreshold && queue.Count > ContentSearchSettings.FullRebuildItemCountThreshold())
        {
            CrawlingLog.Log.Warn($"[Index={index.Name}] The number of changes exceeded maximum threshold of '{ContentSearchSettings.FullRebuildItemCountThreshold()}'.");
            if (RaiseRemoteEvents)
            {
                IndexCustodian.FullRebuild(index).Wait();
            }
            else
            {
                IndexCustodian.FullRebuildRemote(index).Wait();
            }
        }
        else
        {
            var list = ExtractIndexableInfoFromQueue(queue).ToList();
            // custom code start here...
            CrawlingLog.Log.Info($"[Index={index.Name}] Found '{list.Count}' items from Event Queue.");
            var result = new List<IndexableInfo>();
            CrawlingLog.Log.Info($"[Index={index.Name}] OnPublishEndWithAncestorAsynchronousStrategy executing.");
            foreach (var itemInfo in list)
            {
                var ancestor = GetAncestor(itemInfo);
                if (ancestor != null)
                {
                    if (list.Any(i => i.IndexableId.Equals(ancestor.IndexableId, StringComparison.OrdinalIgnoreCase)))
                    {
                        CrawlingLog.Log.Info($"[Index={index.Name}] Ancestor already in list '{ancestor.IndexableId}'.");
                    }
                    else
                    {
                        CrawlingLog.Log.Info($"[Index={index.Name}] Adding ancestor '{ancestor.IndexableId}'.");
                        result.Add(ancestor);
                    }
                }
            }

            list.AddRange(result);
            CrawlingLog.Log.Info($"[Index={index.Name}] Updating '{list.Count}' items.");
            IndexCustodian.IncrementalUpdate(index, list).Wait();
        }
    }
}

Job(s)

One of the noticeable things here is that we add the extra indexable items to the existing list called with the incremental update. We could also call the Refresh method on the IndexCustodian but that would create extra (background) jobs so this way seems more efficient.

The ancestor check

Last thing to do is the ancestor check itself. For our requirements we needed to find an ancestor of a defined template but this functions could actually do anything. Just keep in mind the performance as this function will be called a lot.. (any ideas how to further improve this are welcome)

private IndexableInfo GetAncestor(IndexableInfo info)
{
    try
    {
 var childTemplateId = ChildTemplateId.ToLowerInvariant();
 var item = Database.GetItem(((ItemUri)info.IndexableUniqueId.Value).ItemID);
 if (item != null && childTemplateId.Contains(item.TemplateID.Guid.ToString("B")))
 {
     var ancestor = item.Axes.GetAncestors().ToList().FindLast(i => i.TemplateID.Guid.ToString("B").Equals(ParentTemplateId, StringComparison.OrdinalIgnoreCase));
     if (ancestor != null)
     {
  return new IndexableInfo(
                        new SitecoreItemUniqueId(
                            new ItemUri(ancestor.ID, ancestor.Language, ancestor.Version, Database)), 
                            info.Timestamp)
    {
        IsSharedFieldChanged = info.IsSharedFieldChanged
    };
     }
 }
    }
    catch (Exception e)
    {
 CrawlingLog.Log.Error($"[Index] Error getting ancestor for '{info.IndexableId}'.", e);
    }

    return null;
}


Using the child template in the config as well, might seems like a limitation but here it gives us a good performance gain because we limit the number of (slow) ancestor look-ups a lot. We still need to do that first lookup of the actual item to detect the template though.
We catch all exceptions - ok, might be bad practice - just to make sure in our test that one failure doesn't break it all.

Conclusion

As usual, we managed to tweak Sitecore in a fairly easy manor. This example can hopefully lead you towards more optimizations and other implementations of custom index features. Suggestions and/or improvements are welcome...


Friday, February 26, 2016

Custom indexes in a Sitecore Helix architecture

Sitecore Helix/Habitat

Most Sitecore developers probably know what Sitecore Habitat and Sitecore Helix is about right now, especially since the 2016 Hackathon. Several interesting blog posts have already been written about the architecture (e.g. this one by Anders Laub) and video's have been posted on Youtube by Thomas Eldblom.

Custom indexes

I use custom indexes very frequently. So I started thinking about how I could use custom indexes in a "Helix" architecture. When creating a feature that uses such a custom index, the configuration for that index has to be in the feature. That is perfectly possible as we can create a separate index config file. But what are the things we define in that config file?

  • index
    • a name
    • the strategy
    • crawler(s)
    • ...
  • index configuration
    • field map
    • document options
      • computed index fields
      • include templates
      • ...
    • ...

Some of these settings can be defined in our feature without issue: the name -obviously- and the stategy (e.g. onPublishEndAsync) can be defined. A crawler might be the first difficulty but this can be set to a very high level (e.g. <Root>/sitecore/content</Root>)

In the index configuration we can also define the fieldMap and such. In the documentOptions section we can (must) define our computed index fields. But then we should define our included templates. And that was were I got stuck..  in a feature I don't know my template, just base templates..

Patching

A first thought was to use the patching mechanism from Sitecore. We could define our index and it's configuration in the feature and patch the included templates and/or crawlers in the project.
Sounds like a plan, but especially for the included templates it didn't feel quite right.

For the index itself patching will be necessary in some cases.. e.g. to enable or disable Item/Field language fallback. If needed it is also possible to patch the content root in the project level.

Included templates? Included base templates!

For the included templates in the document options I was searching for another solution so I decided to throw my question on the Sitecore Slack Helix/Habitat channel and ended up in a discussion with Thomas Eldblom and Sitecore junkie Mike Reynolds. Thomas came up with the idea to hook into the index process to enable it to include base templates and Mike kept pushing me to do it and so.. I wrote an extension to configure your index based on base templates.

The code is a proof of concept.. it can -probably- be made better still but let this be a start.


Custom document options

I started by taking a look at one of my custom indexes to see what Sitecore was doing with the documentOptions sections and took a look at their code in Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilderOptions. As you can guess, the poc is done with Lucene..

Configuration

The idea was to create a custom document options class by inheriting from the LuceneDocumentBuilderOptions. I could add a new method to allow adding templates in a new section with included base templates. This will not break any other configuration sections.

An example config looks like:
<documentOptions type="YourNamespace.TestOptions, YourAssembly">
    <indexAllFields>true</indexAllFields>
    <include hint="list:AddIncludedBaseTemplate">
        <BaseTemplate1>{B6FADEA4-61EE-435F-A9EF-B6C9C3B9CB2E}</BaseTemplate1>
    </include>
</documentOptions>
This looks very familiar - as intended. We create a new include section with the hint "list:AddIncludedBaseTemplate". The name 'AddIncludedBaseTemplate' will come back later in our code.

Code

Related templates

The first function we created was to get all templates that relate to our base template:

private IEnumerable<Item> GetLinkedTemplates(Item item)
{
  var links = Globals.LinkDatabase.GetReferrers(item, new ID("{12C33F3F-86C5-43A5-AEB4-5598CEC45116}"));
  if (links == null)
  {
    return new List<Item>();
  }

  var items = links.Select(i => i.GetSourceItem()).Where(i => i != null).ToList();
  var result = new List<Item>();
  foreach (var linkItem in items)
  {
    result.AddRange(GetLinkedTemplates(linkItem));
  }

  items.AddRange(result);
  return items;
}

We use the link database to get the referrers and use the Guid of the "Base template" field of a template to make sure that we get references in that field only - which also makes sure that all results are actual Template items.
The function is recursive because a template using your base template can again be a base template for another template (which will by design also include your original base template). The result is a list of items.

A second function will use our first one to generate a list of Guids from the ID of the original base template:
public IEnumerable<string> GetLinkedTemplates(ID id)
{
  var item = Factory.GetDatabase("web").GetItem(id);
  Assert.IsNotNull(item, "Configuration : templateId cannot be found");

  var linkedItems = GetLinkedTemplates(item);
  return linkedItems.Select(l => l.ID.Guid.ToString("B").ToUpperInvariant()).Distinct();
}

As you can see what we do here is try to fetch the item from the id and call our GetLinkedTemplates function. From the results we take the distinct list of guid-strings - in uppercase.

Context database

One big remark here is the fact that I don't know what the database is - if somebody knows how to do that, please let me (and everybody) know. The context database in my tests was 'core' - I tried to find the database defined in the crawler because that is the one you would need but no luck so far.

And finally...

AddIncludedBaseTemplate

public virtual void AddIncludedBaseTemplate(string templateId)
{
  Assert.ArgumentNotNull(templateId, "templateId");
  ID id;
  Assert.IsTrue(ID.TryParse(templateId, out id), "Configuration: AddIncludedBaseTemplate entry is not a valid GUID. Template ID Value: " + templateId);
  foreach (var linkedId in GetLinkedTemplates(id))
  {
    AddTemplateFilter(linkedId, true);
  }
}

Our main function is called "AddIncludedBaseTemplate" - this is consistent with the name used in the configuration. In the end we want to use the "AddTemplateFilter" function from the base DocumentBuilderOptions - the 'true' parameter is telling the function that the templates are included (false is excluded). So we convert the template guid coming in to an ID to validate it and use it in the functions we created to get all related templates.

Performance

Determining your included templates is apparently only done once at startup. So if you have a lot of base templates to include which have lots of templates using them, don't worry about this code being called on every index update. Which of course doesn't mean we shouldn't think about performance here ;)

Conclusion

So we are now able to configure our index to only use templates that inherit from our base templates. Cool. Does it end here? No.. you can re-use this logic to create other document options as well to tweak your index behavior. 

And once more: thanks to Thomas & Mike for the good chat that lead to this.. The community works :)

Wednesday, December 23, 2015

Sitecore Lucene index and DateTime fields

[Sitecore 8.1]

DateTime field in Lucene index

I was trying to create an index search for an event calendar that would give me items (from a template etc..)  that have a datefield:
  • from today onwards (today included) 
  • up until today
The field is Sitecore is a date field (so no time indication), but our query seemed to have issues with the time indications. The code to create the predicate looks like this:

private Expression<Func<EventItem, bool>> GetDatePredicate(OverviewMode mode)
{
  var predicate = PredicateBuilder.True<EventItem>();
  switch (mode)
  {
 case OverviewMode.Future:
 {
  var minDate = DateTime.Today.ToUniversalTime();
  predicate = predicate.And(n => n.StartDate > minDate);
  break;
 }
 case OverviewMode.Past:
 {
  var maxDate = DateTime.Today.ToUniversalTime();
  var minDate = DateTime.MinValue.ToUniversalTime();
  predicate = predicate.And(n => n.StartDate < maxDate).And(n => n.StartDate > minDate);
  break;
 }
 default:
 {
  return null;
 }
  }
  return predicate;
}


This did not work correctly with events "today". We had to add "AddDays(-1)" after the Today before we set it to UTC. So why?

The first reason is that Sitecore stores its DateTimes in UTC which was an hour difference with our local time. So, our dates shifted a day back: "12/12/2015" becomes "12/11/2015 23:00". This is known and should be no issue as we also shift to UTC in our predicate.

But still.. we did not get the correct results.

The logs

So we look at the logs. Sitecore logs all requests in the Search log file. We saw that our predicate was translated into something like this:
"+(+date_from:[* TO 20151111t230000000z} +date_from:{00010101t000000000z TO *])"

Looks fine, but note that the "t" in the dates is lowercase. In my index however they are all uppercase. If I try the query with Luke it does give me the wrong results indeed.. When I alter the query in Luke to use uppercase T it works correctly..

Support, here we come!


Solution(s)

Support gave us 2 possible solutions, next to the one we already had (skipping a day).

1. Format

We could alter our index to use a format attribute:
<field fieldName="datefrom" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" 
format="yyyyMMdd" type="System.DateTime" 
settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>

After rebuilding our index, the "DateFrom" field values, stored in the index, will contain only dates (like "20151209"), so search by dates should return results as expected (since there are no "T" and "Z" symbols). 
This works if you really don't need the times..

2. Custom Converter

Another solution is to override the "Sitecore.ContentSearch.Converters.IndexFieldUtcDateTimeValueConverter" class to store dates in lower case to the index.

Add your converter to the index config:
<converters hint="raw:AddConverter">
  ...
  <converter handlesType="System.DateTime" 
         typeConverter="YourNamespace.LowerCaseIndexFieldUtcDateTimeValueConverter, YourAssembly" />
  ...
</converters>

As a result, all dates should be stored to the index in lower case. As the search query is in lower case, all expected results should be found.


Future solution

Since currently search queries are always generated in lower case and this behavior is currently not configurable (the "LowercaseExpandedTerms" property of the "Lucene.Net.QueryParsers.QueryParser" class is always set to true, which lowers parameters in a search query string), a feature request for the product was made so that it can be considered for future implementations. That should make these tweaks unnecessary..

Monday, December 7, 2015

Sitecore Lucene index with integers

The situation

We recently discovered an issue when using a facet on an integer field in a Sitecore (8.1) Lucene index. We had a number of articles (items) with a date field. We had to query these items, order them by date and determine the number of items in each year.

The code

We created a ComputedField "year" and filled it with the year part of the date:
var dateTime = ((DateField)publicationDateField).DateTime;
return dateTime.Year;
We added the field to a custom index, and created an entry in the fieldmap to mark it as System.Int32. We rebuild the index, check the contents with Luke and all is fine. So we create a class based on SearchResultItem to use for the query:

class NewsItem : SearchResultItem
{
    [IndexField("title")]
    public string Title { get; set; }

    [IndexField("publication date")]
    public DateTime Date { get; set; }

    [IndexField("category")]
    public Guid Category { get; set; }

    [IndexField("year")]
    public int PublicationYear { get; set; }
}

The query

When we use this class for querying, we get not results when filtering on the year.. apparently integer fields need to be tokenized to be used in searches (indexType="TOKENIZED"). Sounds weird as this is surely not true for text fields, but the NumericField constructor makes it clear:

Lucene.Net.Documents.NumericField.NumericField(string name, int precisionStep, Field.Store store, bool index) : base(name, store, index ? Field.Index.ANALYZED_NO_NORMS : Field.Index.NO, Field.TermVector.NO)

So, we changed the field in the fieldmap and set it tokenized. We add an analyzer to prevent the integer being cut in parts (Lucene.Net.Analysis.KeywordAnalyzer or Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer).

Success?

Yeah! We have results! We got the news items for 2015! And 2014..  But... there is always a but or this post would be too easy. We still needed a facet. And there it went wrong. The facet resulted in this:


Not what we expected actually...

So back to our query and index..  Sitecore Support found out that this happens because of the specific way the numeric fields are indexed by Lucene, they are indexed not just as simple tokens but as a tree structure (http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/document/NumericField.html).

Unfortunately, Sitecore cannot do faceting on such fields at this moment - this is now logged as a bug.

The Solution

The solution was actually very simple. We threw out the field from the fieldmap and changed the int in our NewsItem to string. If we want to use them as an integer we need to cast them afterwards, but for now we don't even need that.
Luckily for us, even the sorting doesn't care as our int's are years. So we were set.. queries are working and facets are fine.

Friday, November 13, 2015

Sitecore indexes

There are already a lot of blog posts describing the use of Sitecore indexes, especially since Sitecore 7 and the introduction of the ContentSearchManager and the ease to use them.
And still.. I see lots of people writing queries going through lots of items. So: yet another post to promote the use of indexes.

Sitecore indexes, indexes, indexes!

Sitecore has some built-in indexes (since Sitecore 8 even more). The best know are probably the sitecore_master and sitecore_web indexes. Personally I never use those. I always create a custom index. Why? 
  • I don't want to mess with the indexes that Sitecore uses
  • I want my indexes small and lean
    • faster (re)build
    • easier to check

When to use?

It's hard to say exactly when to use an index, but I'll try to give some common real-life examples of request where I almost always think "index":
  • fetch all news items from year x
  • fetch all products from category x
  • fetch all events happening in the future
  • fetch the latest news items
  • get the last 3 blog posts written by x
  • ...
Too often developers write queries which are fast with the test data.. but after a while the real data is has outgrown the solution and it gets slow..  So it's better to think ahead and make more use of those indexes. In lots of cases the result will be faster than a (fast) query.

Create a custom index

As there already are a lot of examples out there, just a fast introduction on how you can create (and use) a custom index. For more information on all the possibilities, check the Sitecore docs (or the default index config files which include examples and comments).

Configuration

I usually create a separate config file where I put the index definition and configuration together. 

Example index definition:

<configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">
  <indexes hint="list:AddIndex">
    <index id="MyCustom_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
      <param desc="name">$(id)</param>
      <param desc="folder">$(id)</param>
      <!-- This initializes index property store. Id has to be set to the index id -->
      <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
      <configuration ref="contentSearch/indexConfigurations/myCustomIndexConfiguration" />
      <strategies hint="list:AddStrategy">
       <!-- NOTE: order of these is controls the execution order -->
       <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
      </strategies>
      <commitPolicyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch">
        <policies hint="list:AddCommitPolicy">
          <policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy, Sitecore.ContentSearch" />
        </policies>
      </commitPolicyExecutor>
      <locations hint="list:AddCrawler">
        <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
         <Database>web</Database>
           <Root>/sitecore/content/Corporate</Root>
        </crawler>
      </locations>
    </index>
  </indexes>
</configuration>

In this example I used the LuceneProvider, the onPublishEndAsync update strategy (info on update strategies by John West here) and refer to my custom configuration.
Note that I added a crawler for the web database and gave it a root path (can also be an ID).


Example index configuration:

<indexConfigurations>
  <myCustomIndexConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
    <indexAllFields>true</indexAllFields>
    <initializeOnAdd>true</initializeOnAdd>
    <analyzer ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/analyzer" />
    <documentBuilderType>Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder, Sitecore.ContentSearch.LuceneProvider</documentBuilderType>
    <fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
      <fieldNames hint="raw:AddFieldByFieldName">
        <field fieldName="_uniqueid" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
          <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
        </field>
        <field fieldName="__sortorder" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.Integer" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="title" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
          <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
        </field>
        <field fieldName="date" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.DateTime" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="sequence" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.Integer" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="topics" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.Guid" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="applications" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
      </fieldNames>
    </fieldMap>
    <include hint="list:IncludeTemplate">
      <NewsTemplate>{5CD362B8-C129-437A-A0D4-4EE58E71FEB1}</NewsTemplate>
      <ProductTemplate>{18D5467C-79F9-405B-AA87-2BA4B7CDB443}</ProductTemplate>
      <EventTemplate>{6CA9AC2A-1A9D-429B-870C-FC9417D3A1C7}</EventTemplate>
    </include>
    <fieldReaders ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldReaders"/>
    <indexFieldStorageValueFormatter ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexFieldStorageValueFormatter"/>
    <indexDocumentPropertyMapper ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexDocumentPropertyMapper"/>
  </myCustomIndexConfiguration>
</indexConfigurations>


Notice here:

  • the "indexAllFields" : if not true, you need to define the fields in an include (just like the IncludeTemplate, but then with includeField)
  • the "fieldMap": here we define the field options: storageType, type (can be string, int, guid, date, ...) and if needed an analyzer (all information on analyzers by Adam Conn here)
  • the IncludeTemplate section where we define the templates of the items to include in the index (the guid is important, the name is useful for understanding the config)

Standard Sitecore fields
Most of the standard Sitecore fields are included automatically. 
Some are not, but you can include them (in the example above the SortOrder field is included).


After creating the configuration you should see your index in the Index Manager in Sitecore and you can rebuild it. Check your data in the index with a tool like Luke. This way you are sure your config is good before you start using it. Luke is also handy later on to check your queries.

Computed fields

Computed fields are fields that are added to the index through custom code (the value is computed instead of just fetched from a field). Off course, computed fields can also be added to a custom index.

Querying your index

Sitecore has a class SearchResultItem that can be used to fetch results from the index, but in most cases you will want to extend this class.

Example SearchResultItem:


public class EventItem : SearchResultItem
public class EventItem : SearchResultItem
{
  [IndexField("title")]
  public string Title { get; set; }
  
  [IndexField("startdate")]
  public DateTime StartDate { get; set; }
  
  [IndexField("profile")]
  public ID Profile { get; set; }
}
We use the SearchContext to do the actual query. Example code:

private IEnumerable<EventItem> GetEventItems()
{
  var templateRestrictions = new List<ID>
  {
    new ID(applicationSettings.EventsTemplateId)
  };

  using (var context = ContentSearchManager.GetIndex("MyCustom_index").CreateSearchContext())
  {
    var templatePredicate = PredicateBuilder.False<EventItem>();
    templatePredicate = templateRestrictions.Aggregate(templatePredicate, (current, template) => current.Or(p => p.TemplateId == template));
    var datePredicate = PredicateBuilder.True<EventItem>();
    datePredicate = datePredicate.And(p => p.StartDate >= DateTime.Today);
    var predicate = PredicateBuilder.True<EventItem>();
    predicate = predicate.And(templatePredicate);
    predicate = predicate.And(datePredicate);
    predicate = predicate.And(p => p.Language == Sitecore.Context.Language.Name);
    var query = context.GetQueryable<EventItem>(new CultureExecutionContext(Sitecore.Context.Language.CultureInfo)).Where(predicate).OrderBy(p => p.StartDate);
    var queryResults = query.GetResults();
    foreach (var hit in queryResults.Hits)
    {
      if (string.IsNullOrEmpty(hit.Document.Title))
      {
        continue;
      }

      yield return hit.Document;
    }
  }
}
We use "predicates" to define our query. I find them useful to create reusable code (not shown here), especially combined with generics. Predicates are created with the PredicateBuilder (use true for "and" and false for "or" queries).

First we defined a predicate to check the templateID (from a list of possibilities). We also check a datefield and in the end we have predicate for the language.

In the example we sort (OrderBy), but the queryable has also options to use paging, facetting, ... The resultSet include a list of results, but also the facets, the total number of results (important when paging), ..   

Make sure that if you sort you are using the correct types. Sorting numbers as string will give you unexpected results..

Fetching Sitecore items
It is also important to know that the results are not yet Sitecore items - we get the items we define (our SearchResultItem's). It is however quite easy to fetch the actual Sitecore items here, also using Glass if you want. Be aware though that the part after the index is sometimes the performance bottleneck: you wouldn't be the first to lose all performance benefits from the index by fetching too many Sitecore items or writing a slow Linq query after the search.

Before fetching the real Sitecore items (or Glass-mapped-classes), consider if you really need them. In lots of cases you will, but sometimes the information from the index can be sufficient and you can save even more time not retrieving actual items.


Logs

If your query returns unexpected results a good place to start looking in the search log file. All queries that are performed are logged there and if you are using Luke you can copy/paste the query in Luke and test it. 


Issues

There are some known issues.. some unknown as well. I have a few open tickets with Sitecore support regarding indexes at the moment, so maybe more posts will follow...