ComputedIndexField with dependencies to other items
Ever had a ComputedIndexField that gathered data from other items than the item currently being indexed? For example, from its children. Or from items referred to.. I just had a situation where we needed (a property of) the children to be included in our ComputedIndexField.But what happens if you update a child item? The child is re-indexed but the parent is not as this was not changed, not published, ... We were using the onPublishEndAsync update strategy and didn't want to have a solution that needed a rebuild periodically just to keep the index up to date.
There is a GetDependencies pipeline that can be used to add extra items to the indexable list, but that was no option as this pipeline is for all indexes and we wanted it just for our custom index and preferably configurable per index (thinking about performance as well..)
Extend the onPublishEnd update strategy
So, we started thinking of extending the update strategy. We found some examples on the internet but not in our Sitecore version (we are using Sitecore 8.1) and the examples didn't go far enough.What we wanted was:
- the base onPublishEnd strategy (that would still check for rebuilds and so on)
- an extension that would add the first ascendant of the item of a defined template to the list of indexable items
I had a look at the code of the OnPublishEndAsynchronousStrategy with DotPeek and noticed that this was extendable indeed.
Let's start by creating our class by doing what a developer is good at: copy/paste :)
[DataContract]
public class OnPublishEndWithAncestorAsynchronousStrategy : OnPublishEndAsynchronousStrategy
{
public string ParentTemplateId { get; set; }
public string ChildTemplateId { get; set; }
public OnPublishEndWithAncestorAsynchronousStrategy(string database) : base(database)
{
}
}
We created a class that extends OnPublishEndAsynchronousStrategy and gave it a constructor that needs the database name (which will be passed in the config). We also defined two variables to identify the templates that are affected - both parent (ancestor item to look for) as child (item to start from).Performance
The child item template(s) are requested because our strategy code is executed before the crawler's root path is checked and before the 'DocumentOptions' are checked (like 'IncludeTemplates'). As this extended strategy is already heavier than the original one we wanted to prevent getting even more performance hits for items we don't need to check. This will become clear later on...Configuration
<strategies hint="list:AddStrategy">
<onPublishEndWithAncestorAsync type="Xx.OnPublishEndWithAncestorAsynchronousStrategy, Xx">
<param desc="database">web</param>
<ParentTemplateId>{0F5141D6-F264-4D03-B5D2-3505E6F308E7}</ParentTemplateId>
<ChildTemplateId>{2A993FF2-5F17-4EEA-AD53-5343794F86BB}{066DEA00-31D7-4838-94A6-8D05A7FC690E}</ChildTemplateId>
</onPublishEndWithAncestorAsync>
</strategies>
In the strategies section where you normally add your strategies by pointing towards the one(s) defined in the default Sitecore index configurations we define our custom strategy by providing the type. We send the database (web) as parameter and define the guids for the templates. In this example code we can send multiple child templates.The index run
After some investigation, it turned out we only had to override one method: "Run".
We started by copy/pasting the original code and checked the extension points:
- if the item queue is empty: we leave the original code
- if the item queue is so big a rebuild is suggested: we keep the original code as a rebuild will also update the ancestor we might add
- else..
We kept the original code to fetch the list of items to refresh. We don't actually get the items as "Item" but as "IndexableInfo" objects. For each entry in this list we call our GetAncestor function. The result is checked for null and added to the original list only if is wasn't already in there.
protected override void Run(List<QueuedEvent> queue, ISearchIndex index)
{
CrawlingLog.Log.Debug($"[Index={index.Name}] {GetType().Name} executing.");
if (Database == null)
{
CrawlingLog.Log.Fatal($"[Index={index.Name}] OperationMonitor has invalid parameters. Index Update cancelled.");
}
else
{
queue = queue.Where(q => q.Timestamp > (index.Summary.LastUpdatedTimestamp ?? 0L)).ToList();
if (queue.Count <= 0)
{
CrawlingLog.Log.Debug($"[Index={index.Name}] Event Queue is empty. Incremental update returns");
}
else if (CheckForThreshold && queue.Count > ContentSearchSettings.FullRebuildItemCountThreshold())
{
CrawlingLog.Log.Warn($"[Index={index.Name}] The number of changes exceeded maximum threshold of '{ContentSearchSettings.FullRebuildItemCountThreshold()}'.");
if (RaiseRemoteEvents)
{
IndexCustodian.FullRebuild(index).Wait();
}
else
{
IndexCustodian.FullRebuildRemote(index).Wait();
}
}
else
{
var list = ExtractIndexableInfoFromQueue(queue).ToList();
// custom code start here...
CrawlingLog.Log.Info($"[Index={index.Name}] Found '{list.Count}' items from Event Queue.");
var result = new List<IndexableInfo>();
CrawlingLog.Log.Info($"[Index={index.Name}] OnPublishEndWithAncestorAsynchronousStrategy executing.");
foreach (var itemInfo in list)
{
var ancestor = GetAncestor(itemInfo);
if (ancestor != null)
{
if (list.Any(i => i.IndexableId.Equals(ancestor.IndexableId, StringComparison.OrdinalIgnoreCase)))
{
CrawlingLog.Log.Info($"[Index={index.Name}] Ancestor already in list '{ancestor.IndexableId}'.");
}
else
{
CrawlingLog.Log.Info($"[Index={index.Name}] Adding ancestor '{ancestor.IndexableId}'.");
result.Add(ancestor);
}
}
}
list.AddRange(result);
CrawlingLog.Log.Info($"[Index={index.Name}] Updating '{list.Count}' items.");
IndexCustodian.IncrementalUpdate(index, list).Wait();
}
}
}
Job(s)
One of the noticeable things here is that we add the extra indexable items to the existing list called with the incremental update. We could also call the Refresh method on the IndexCustodian but that would create extra (background) jobs so this way seems more efficient.The ancestor check
Last thing to do is the ancestor check itself. For our requirements we needed to find an ancestor of a defined template but this functions could actually do anything. Just keep in mind the performance as this function will be called a lot.. (any ideas how to further improve this are welcome)private IndexableInfo GetAncestor(IndexableInfo info)
{
try
{
var childTemplateId = ChildTemplateId.ToLowerInvariant();
var item = Database.GetItem(((ItemUri)info.IndexableUniqueId.Value).ItemID);
if (item != null && childTemplateId.Contains(item.TemplateID.Guid.ToString("B")))
{
var ancestor = item.Axes.GetAncestors().ToList().FindLast(i => i.TemplateID.Guid.ToString("B").Equals(ParentTemplateId, StringComparison.OrdinalIgnoreCase));
if (ancestor != null)
{
return new IndexableInfo(
new SitecoreItemUniqueId(
new ItemUri(ancestor.ID, ancestor.Language, ancestor.Version, Database)),
info.Timestamp)
{
IsSharedFieldChanged = info.IsSharedFieldChanged
};
}
}
}
catch (Exception e)
{
CrawlingLog.Log.Error($"[Index] Error getting ancestor for '{info.IndexableId}'.", e);
}
return null;
}
Using the child template in the config as well, might seems like a limitation but here it gives us a good performance gain because we limit the number of (slow) ancestor look-ups a lot. We still need to do that first lookup of the actual item to detect the template though.
We catch all exceptions - ok, might be bad practice - just to make sure in our test that one failure doesn't break it all.
Conclusion
As usual, we managed to tweak Sitecore in a fairly easy manor. This example can hopefully lead you towards more optimizations and other implementations of custom index features. Suggestions and/or improvements are welcome...
No comments:
Post a Comment