Tuesday, March 22, 2016

Custom index update strategy

ComputedIndexField with dependencies to other items

Ever had a ComputedIndexField that gathered data from other items than the item currently being indexed? For example, from its children. Or from items referred to..  I just had a situation where we needed (a property of) the children to be included in our ComputedIndexField.
But what happens if you update a child item? The child is re-indexed but the parent is not as this was not changed, not published, ...  We were using the onPublishEndAsync update strategy and didn't want to have a solution that needed a rebuild periodically just to keep the index up to date.

There is a GetDependencies pipeline that can be used to add extra items to the indexable list, but that was no option as this pipeline is for all indexes and we wanted it just for our custom index and preferably configurable per index (thinking about performance as well..)

Extend the onPublishEnd update strategy

So, we started thinking of extending the update strategy. We found some examples on the internet but not in our Sitecore version (we are using Sitecore 8.1) and the examples didn't go far enough.

What we wanted was:
  • the base onPublishEnd strategy (that would still check for rebuilds and so on)
  • an extension that would add the first ascendant of the item of a defined template to the list of indexable items

I had a look at the code of the OnPublishEndAsynchronousStrategy with DotPeek and noticed that this was extendable indeed. 

Let's start by creating our class by doing what a developer is good at: copy/paste :)

[DataContract]
public class OnPublishEndWithAncestorAsynchronousStrategy : OnPublishEndAsynchronousStrategy
{
    public string ParentTemplateId { get; set; }
    public string ChildTemplateId { get; set; }

    public OnPublishEndWithAncestorAsynchronousStrategy(string database) : base(database)
    {
    }
}

We created a class that extends OnPublishEndAsynchronousStrategy and gave it a constructor that needs the database name (which will be passed in the config). We also defined two variables to identify the templates that are affected - both parent (ancestor item to look for) as child (item to start from).

Performance

The child item template(s) are requested because our strategy code is executed before the crawler's root path is checked and before the 'DocumentOptions' are checked (like 'IncludeTemplates'). As this extended strategy is already heavier than the original one we wanted to prevent getting even more performance hits for items we don't need to check. This will become clear later on...

Configuration


<strategies hint="list:AddStrategy">
  <onPublishEndWithAncestorAsync type="Xx.OnPublishEndWithAncestorAsynchronousStrategy, Xx">
    <param desc="database">web</param>
    <ParentTemplateId>{0F5141D6-F264-4D03-B5D2-3505E6F308E7}</ParentTemplateId>
    <ChildTemplateId>{2A993FF2-5F17-4EEA-AD53-5343794F86BB}{066DEA00-31D7-4838-94A6-8D05A7FC690E}</ChildTemplateId>
  </onPublishEndWithAncestorAsync>
</strategies>

In the strategies section where you normally add your strategies by pointing towards the one(s) defined in the default Sitecore index configurations we define our custom strategy by providing the type. We send the database (web) as parameter and define the guids for the templates. In this example code we can send multiple child templates.

The index run

After some investigation, it turned out we only had to override one method: "Run". 
We started by copy/pasting the original code and checked the extension points:
  • if the item queue is empty: we leave the original code 
  • if the item queue is so big a rebuild is suggested: we keep the original code as a rebuild will also update the ancestor we might add
  • else..
We kept the original code to fetch the list of items to refresh. We don't actually get the items as "Item" but as "IndexableInfo" objects. For each entry in this list we call our GetAncestor function. The result is checked for null and added to the original list only if is wasn't already in there.


protected override void Run(List<QueuedEvent> queue, ISearchIndex index)
{
    CrawlingLog.Log.Debug($"[Index={index.Name}] {GetType().Name} executing.");
    if (Database == null)
    {
        CrawlingLog.Log.Fatal($"[Index={index.Name}] OperationMonitor has invalid parameters. Index Update cancelled.");
    }
    else
    {
        queue = queue.Where(q => q.Timestamp > (index.Summary.LastUpdatedTimestamp ?? 0L)).ToList();
        if (queue.Count <= 0)
        {
            CrawlingLog.Log.Debug($"[Index={index.Name}] Event Queue is empty. Incremental update returns");
        }
        else if (CheckForThreshold && queue.Count > ContentSearchSettings.FullRebuildItemCountThreshold())
        {
            CrawlingLog.Log.Warn($"[Index={index.Name}] The number of changes exceeded maximum threshold of '{ContentSearchSettings.FullRebuildItemCountThreshold()}'.");
            if (RaiseRemoteEvents)
            {
                IndexCustodian.FullRebuild(index).Wait();
            }
            else
            {
                IndexCustodian.FullRebuildRemote(index).Wait();
            }
        }
        else
        {
            var list = ExtractIndexableInfoFromQueue(queue).ToList();
            // custom code start here...
            CrawlingLog.Log.Info($"[Index={index.Name}] Found '{list.Count}' items from Event Queue.");
            var result = new List<IndexableInfo>();
            CrawlingLog.Log.Info($"[Index={index.Name}] OnPublishEndWithAncestorAsynchronousStrategy executing.");
            foreach (var itemInfo in list)
            {
                var ancestor = GetAncestor(itemInfo);
                if (ancestor != null)
                {
                    if (list.Any(i => i.IndexableId.Equals(ancestor.IndexableId, StringComparison.OrdinalIgnoreCase)))
                    {
                        CrawlingLog.Log.Info($"[Index={index.Name}] Ancestor already in list '{ancestor.IndexableId}'.");
                    }
                    else
                    {
                        CrawlingLog.Log.Info($"[Index={index.Name}] Adding ancestor '{ancestor.IndexableId}'.");
                        result.Add(ancestor);
                    }
                }
            }

            list.AddRange(result);
            CrawlingLog.Log.Info($"[Index={index.Name}] Updating '{list.Count}' items.");
            IndexCustodian.IncrementalUpdate(index, list).Wait();
        }
    }
}

Job(s)

One of the noticeable things here is that we add the extra indexable items to the existing list called with the incremental update. We could also call the Refresh method on the IndexCustodian but that would create extra (background) jobs so this way seems more efficient.

The ancestor check

Last thing to do is the ancestor check itself. For our requirements we needed to find an ancestor of a defined template but this functions could actually do anything. Just keep in mind the performance as this function will be called a lot.. (any ideas how to further improve this are welcome)

private IndexableInfo GetAncestor(IndexableInfo info)
{
    try
    {
 var childTemplateId = ChildTemplateId.ToLowerInvariant();
 var item = Database.GetItem(((ItemUri)info.IndexableUniqueId.Value).ItemID);
 if (item != null && childTemplateId.Contains(item.TemplateID.Guid.ToString("B")))
 {
     var ancestor = item.Axes.GetAncestors().ToList().FindLast(i => i.TemplateID.Guid.ToString("B").Equals(ParentTemplateId, StringComparison.OrdinalIgnoreCase));
     if (ancestor != null)
     {
  return new IndexableInfo(
                        new SitecoreItemUniqueId(
                            new ItemUri(ancestor.ID, ancestor.Language, ancestor.Version, Database)), 
                            info.Timestamp)
    {
        IsSharedFieldChanged = info.IsSharedFieldChanged
    };
     }
 }
    }
    catch (Exception e)
    {
 CrawlingLog.Log.Error($"[Index] Error getting ancestor for '{info.IndexableId}'.", e);
    }

    return null;
}


Using the child template in the config as well, might seems like a limitation but here it gives us a good performance gain because we limit the number of (slow) ancestor look-ups a lot. We still need to do that first lookup of the actual item to detect the template though.
We catch all exceptions - ok, might be bad practice - just to make sure in our test that one failure doesn't break it all.

Conclusion

As usual, we managed to tweak Sitecore in a fairly easy manor. This example can hopefully lead you towards more optimizations and other implementations of custom index features. Suggestions and/or improvements are welcome...


Tuesday, March 1, 2016

Query.MaxItems in Sitecore 8.1

Small tip for people using Query.MaxItems and upgrading to 8.1


Setting the maximum number of results from a Query

As you might know Sitecore has a setting called "Query.MaxItems". The default value of this setting is 100 (well, in the base config file). Setting this value to 0 will make queries return all results without limit - which might (will) have a negative impact on performance in case of a large result set.

We aware that this setting not only influences your own queries, but also some api calls (Axes, SelectItems, ...) and Sitecore fields that use queries underneath. It does not affect fast queries.


The Query.MaxItems value in Sitecore 8.1 : 260

Using queries is in lots of cases not a good idea and as a fan of indexes I (almost) never use them myself but some questions came up when people were upgrading so I decided to blog this little tip: 
in Sitecore 8.1 the Query.MaxItems value is patched in Sitecore.ExperienceExplorer.config and set to 260
If you patched the value yourself and did not use a separate include file (as you should!) and did not take care that your include file comes at the end (prefix with z is a common trick - using a subfolder starting with z- is a better one) this new value of 260 will overwrite yours.