Thursday, June 16, 2016

Sitecore Index dependencies

I recently stumbled upon a question on how to trigger re-indexing of related content in a Sitecore (Lucene) index. Different answers were given and I got the feeling that not everyone already knows about the getDependencies pipeline. So we write a blog post...

Re-index related content

As I mentioned, there are other solutions that could do the trick. 
  • Custom update strategy

    You could write your own update strategy and include your dependency logic in there. This approach has the benefit that you can use it in one index only without affecting others.
  • Custom save handler

    With a custom save handler you could detect save actions, get the dependent items and register them as well for index updating. I'm not convinced that this will work in all update strategy scenario's but if you have working code, feel free to share ;)
These are probably also valid solutions, but I'll leave those to others as I want to show the Sitecore pipeline that looks like the ideal candidate for the job.

getDependencies pipeline

There is a pipeline.. there always is. One drawback I'll mention already is that the pipeline is for all indexes and so far I have not found a way to trigger it for one index only (see update below on disabling). I also tried to get the index (name or anything) in the code but that didn't work out either. We could get the name of the job, but that was only relevant for the first batch of items - after that, multiple jobs were started and the name became meaningless. 

Anyway, the pipeline. In the Sitecore.ContentSearch.config you'll find this:
<!-- INDEXING GET DEPENDENCIES
  This pipeline fetches dependant items when one item is being index. Useful for fetching related or connected items that also
  need to be updated in the indexes.
  Arguments: (IQueryable) Open session to the search index, (Item) The item being indexed.
  Examples: Update clone references.
  Update the data sources that are used in the presentation components for the item being indexed.
-->

<indexing.getDependencies help="Processors should derive from Sitecore.ContentSearch.Pipelines.GetDependencies.BaseProcessor">
  <!-- When indexing an item, make sure its clones get re-indexed as well -->
  <!--<processor type="Sitecore.ContentSearch.Pipelines.GetDependencies.GetCloningDependencies, Sitecore.ContentSearch"/>-->
  <!-- When indexing an item, make sure its datasources that are used in the presentation details gets re-indexed as well -->
  <!--<processor type="Sitecore.ContentSearch.Pipelines.GetDependencies.GetDatasourceDependencies, Sitecore.ContentSearch"/>-->
</indexing.getDependencies>

As you can see, some processors are in the box, but in comments. You can simply enable them if you want your clones and/or datasources to be indexed with the main items.

And you can write your own processor of course. An example:
public class GetPageDependencies : Sitecore.ContentSearch.Pipelines.GetDependencies.BaseProcessor
{
    public override void Process(GetDependenciesArgs context)
    {
        Assert.IsNotNull(context.IndexedItem, "indexed item");
        Assert.IsNotNull(context.Dependencies, "dependencies");
            
        var scIndexable = context.IndexedItem as SitecoreIndexableItem;
        if (scIndexable == null) return;
            
        var item = scIndexable.Item;
        if (item == null) return;
            
        // optimization to reduce indexing time by skipping this logic for items not in the Web database
        if (!string.Equals(item.Database.Name, "web", StringComparison.OrdinalIgnoreCase)) return;
            
        if (!item.Paths.IsContentItem) return;
            
        if (item.Name.Equals("__Standard Values", StringComparison.OrdinalIgnoreCase)) return;
            
        if (Sitecore.Context.Job == null) return;
            
        // logic here - example = get first child
        if (!item.HasChildren) return;
            
        var dependency = item.Children[0];
        var id = (SitecoreItemUniqueId)dependency.Uri;
        if (!context.Dependencies.Contains(id))
        {
            context.Dependencies.Add(id);
        }
    }
}

In the example here we keep it simple and just add the first child (if any). That logic can contain anything though.

As you can see we try to get out of the processor as fast as possible. You can add even more checks based on template and so on. Getting out fast if you don't want the dependencies is important!

The benefit of the solution is that the pipeline is executed when the indexing starts but before the list of items to index is finalized - which is the best moment for this task. All "extra" items are added to the original list so they are executed (indexed) by the same job and we let the Sitecore handle them they way it was meant.

Performance might not seem an issue, but when having quite some items and dependencies, and these get updated frequently it will be. You might be triggering way too much items towards the index, so be careful (no matter what solution you go for). The indexing is be a background job but if it goes berserk you will notice.
Note that it is a good thing that your dependencies don't have to go through all kind of processes before being added, they are just "added to the list".

I found this pipeline solution very useful in scenario's where the amount of dependent items that actually got added was not too big. Don't forget you can also disable the pipeline processor temporarily (and perform a rebuild) if needed.

How to Enable/Disable 

(from the Sitecore Search and Indexing on SDN) - thx jammykam for the info

The pipeline is executed from within each crawler if the crawler’s ProcessDependencies property is set to true, which is the default. To disable this feature, add the following parameter to the appropriate index under the <Configuration /> section.
<index id="content" ...>
 ...
 <Configuration type="...">
...
 <ProcessDependencies>false</ProcessDependencies>
Alternatively, if the indexes don’t override default configuration with a local one, you can also globally change this setting in the DefaultIndexConfiguration.

Known issues with the indexing.getDependencies pipeline

https://kb.sitecore.net/articles/116076

2 comments:

  1. > One drawback I'll mention already is that the pipeline is for all indexes and so far I have not found a way to trigger it for one index only

    You can add a `false` parameter to your index configuration, or reverse the logic and set it to true on the individual index and false at a global level in the ``.

    This will allow you to control which indexes the pipeline is run for.

    See section 5.3.1 in the "Sitecore Search and Indexing" on SDN: https://sdn.sitecore.net/Reference/Sitecore%207/Sitecore%20Search%20and%20Indexing%20Guide.aspx

    ReplyDelete
    Replies
    1. Thanks for the info! Hadn't read that part yet - will add it to the post.

      Delete