Wednesday, December 23, 2015

Sitecore Lucene index and DateTime fields

[Sitecore 8.1]

DateTime field in Lucene index

I was trying to create an index search for an event calendar that would give me items (from a template etc..)  that have a datefield:
  • from today onwards (today included) 
  • up until today
The field is Sitecore is a date field (so no time indication), but our query seemed to have issues with the time indications. The code to create the predicate looks like this:

private Expression<Func<EventItem, bool>> GetDatePredicate(OverviewMode mode)
{
  var predicate = PredicateBuilder.True<EventItem>();
  switch (mode)
  {
 case OverviewMode.Future:
 {
  var minDate = DateTime.Today.ToUniversalTime();
  predicate = predicate.And(n => n.StartDate > minDate);
  break;
 }
 case OverviewMode.Past:
 {
  var maxDate = DateTime.Today.ToUniversalTime();
  var minDate = DateTime.MinValue.ToUniversalTime();
  predicate = predicate.And(n => n.StartDate < maxDate).And(n => n.StartDate > minDate);
  break;
 }
 default:
 {
  return null;
 }
  }
  return predicate;
}


This did not work correctly with events "today". We had to add "AddDays(-1)" after the Today before we set it to UTC. So why?

The first reason is that Sitecore stores its DateTimes in UTC which was an hour difference with our local time. So, our dates shifted a day back: "12/12/2015" becomes "12/11/2015 23:00". This is known and should be no issue as we also shift to UTC in our predicate.

But still.. we did not get the correct results.

The logs

So we look at the logs. Sitecore logs all requests in the Search log file. We saw that our predicate was translated into something like this:
"+(+date_from:[* TO 20151111t230000000z} +date_from:{00010101t000000000z TO *])"

Looks fine, but note that the "t" in the dates is lowercase. In my index however they are all uppercase. If I try the query with Luke it does give me the wrong results indeed.. When I alter the query in Luke to use uppercase T it works correctly..

Support, here we come!


Solution(s)

Support gave us 2 possible solutions, next to the one we already had (skipping a day).

1. Format

We could alter our index to use a format attribute:
<field fieldName="datefrom" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" 
format="yyyyMMdd" type="System.DateTime" 
settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>

After rebuilding our index, the "DateFrom" field values, stored in the index, will contain only dates (like "20151209"), so search by dates should return results as expected (since there are no "T" and "Z" symbols). 
This works if you really don't need the times..

2. Custom Converter

Another solution is to override the "Sitecore.ContentSearch.Converters.IndexFieldUtcDateTimeValueConverter" class to store dates in lower case to the index.

Add your converter to the index config:
<converters hint="raw:AddConverter">
  ...
  <converter handlesType="System.DateTime" 
         typeConverter="YourNamespace.LowerCaseIndexFieldUtcDateTimeValueConverter, YourAssembly" />
  ...
</converters>

As a result, all dates should be stored to the index in lower case. As the search query is in lower case, all expected results should be found.


Future solution

Since currently search queries are always generated in lower case and this behavior is currently not configurable (the "LowercaseExpandedTerms" property of the "Lucene.Net.QueryParsers.QueryParser" class is always set to true, which lowers parameters in a search query string), a feature request for the product was made so that it can be considered for future implementations. That should make these tweaks unnecessary..

Wednesday, December 16, 2015

Sitecore WFFM 8.1 and multilingual save actions

WFFM Save actions

Every Sitecore developer that had the 'pleasure' of working with WFFM knows about save actions, and probably also knows that save actions by default are shared. And so: not multilingual. In some cases this is no issue, but if you want to send an email to your visitor you might want to do that in his own language.

KB : Solution 2

A solution for this issue is given in https://kb.sitecore.net/articles/040124. Most people use "Solution 2":

Apply the following customization:
  • Navigate to /sitecore/templates/Web Forms for Marketers/Form
  • Uncheck "Shared" checkbox for the Save Action field.
After this change, you must add Save Actions to each language version of the form item. This means that each language of the form item will keep its own list of Save Actions.

The error

This works up until Sitecore 8.0, but when we tried this in Sitecore 8.1 with WFFM 8.1 we got this:

When we go to a form and switch to another language this nice error appears. Apparently the reason is simple: by making the save action field un-shared we caused empty values in some languages for that field. Sounds very logical indeed.

But Sitecore does not expect an empty value in that field, it expects some xml.

The fix (workaround)

  1. Go to an affected form (in a working language)
  2. Switch to "Raw values"
  3. Open a needed language version of a form item.
  4. Insert the following value into the Save Actions field:
<?xml version="1.0" encoding="utf-16"?>
 <li xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <g id="{E5EABB1F-40BC-45BB-8D87-3B6C239B521B}" displayName="Actions" 
     onclick="javascript:return scForm.postEvent(this,event,'forms:addaction')" />
 </li>
Switch back to normal view (and save). This xml will insert an empty save actions block and you are good to go.

Monday, December 7, 2015

Sitecore Lucene index with integers

The situation

We recently discovered an issue when using a facet on an integer field in a Sitecore (8.1) Lucene index. We had a number of articles (items) with a date field. We had to query these items, order them by date and determine the number of items in each year.

The code

We created a ComputedField "year" and filled it with the year part of the date:
var dateTime = ((DateField)publicationDateField).DateTime;
return dateTime.Year;
We added the field to a custom index, and created an entry in the fieldmap to mark it as System.Int32. We rebuild the index, check the contents with Luke and all is fine. So we create a class based on SearchResultItem to use for the query:

class NewsItem : SearchResultItem
{
    [IndexField("title")]
    public string Title { get; set; }

    [IndexField("publication date")]
    public DateTime Date { get; set; }

    [IndexField("category")]
    public Guid Category { get; set; }

    [IndexField("year")]
    public int PublicationYear { get; set; }
}

The query

When we use this class for querying, we get not results when filtering on the year.. apparently integer fields need to be tokenized to be used in searches (indexType="TOKENIZED"). Sounds weird as this is surely not true for text fields, but the NumericField constructor makes it clear:

Lucene.Net.Documents.NumericField.NumericField(string name, int precisionStep, Field.Store store, bool index) : base(name, store, index ? Field.Index.ANALYZED_NO_NORMS : Field.Index.NO, Field.TermVector.NO)

So, we changed the field in the fieldmap and set it tokenized. We add an analyzer to prevent the integer being cut in parts (Lucene.Net.Analysis.KeywordAnalyzer or Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer).

Success?

Yeah! We have results! We got the news items for 2015! And 2014..  But... there is always a but or this post would be too easy. We still needed a facet. And there it went wrong. The facet resulted in this:


Not what we expected actually...

So back to our query and index..  Sitecore Support found out that this happens because of the specific way the numeric fields are indexed by Lucene, they are indexed not just as simple tokens but as a tree structure (http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/document/NumericField.html).

Unfortunately, Sitecore cannot do faceting on such fields at this moment - this is now logged as a bug.

The Solution

The solution was actually very simple. We threw out the field from the fieldmap and changed the int in our NewsItem to string. If we want to use them as an integer we need to cast them afterwards, but for now we don't even need that.
Luckily for us, even the sorting doesn't care as our int's are years. So we were set.. queries are working and facets are fine.

Sunday, November 29, 2015

SOLID Sitecore templates

Every programmer in an object-oriented environment should know the S.O.L.I.D. principles:
  • Single Responsibility
  • Open/closed
  • Liskov substitution
  • Interface segregation
  • Dependency inversion


More information about this topic can be found all over the internet, on wikipedia.

What I want to talk about here is using these principles when working with Sitecore, and Sitecore templates in particular. It will come down to a basic conclusion.
I hope that for most of I will not tell anything new, but I've seen quite a lot of code already that does not apply to these principles and the reason is almost always: we had to do it "fast".

They are lots of blog posts already over solid principles in C# which of course also apply to a Sitecore based solution. I will not repeat that but focus on the architecture of the Sitecore templates - not quite the same as oo-coding but as you will see the solid principles even have their meaning here.

Single Responibility

"A template should have only a single responsibility"

Do not use a template called "News" for something else then a news article. Even if your other item has the same fields. You will create confusion and the maintenance of the templates will become harder. You might consider using base templates if your items have the same fields, but there also goes the single responsibility principle.

Base templates
Do not create base templates that have more than a single responsibility. If you want to re-use fields for SEO purposes, create a seo base template. Do not mix them with e.g. fields for your navigation in a single global "page" base template. If you really need such a 'base', create it from several other base templates.
A base template that has more than a single section should get you thinking...

Open/Closed

"An entity should be open for extension, but closed for modification"

For this one my example does not apply to the template itself, but to functions that switch on template(-id)'s. Quite often I get to see code that has a set of if-statements (or a switch) to identify the template of an item and do something somehow related but still different per template. After a while you might even get to see these if/switch statements more than once in the same class. If you create a new template you might break those function(s) because your new template will not be set in the selections.

How can we make this better maintainable? Create an interface (if you don't already have one, as you should) and use that interface to the outside world. Implement the interface for each (set of) template(s) that you need - use a base class with virtual functions if needed. In front of those classes you create a façade (class) that does the template detection and uses the matching implementation of the interface. The consumers of your code will always see the interface and the implementing classes do not need to check templates anymore.
If you create a new template you need to check your façades and maybe create a new implementation of an interface. But you don't need to scan your solution for template detecting statements anymore.


Liskov substitution

"Objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program"

Applied on templates this comes down to a few simple rules that everyone finds very obvious and still.. the "fast" way, you know.

  • Do not "overwrite" fields from a base template: if a base template already has a field "Title" do not create another "Title" field. Also make sure you do not have multiple base templates that include the same fieldname. 
  • Do not re-use a base template field for a purpose it was not designed. You will not only confuse your fellow programmers, but also you content editors.

Interface segregation

"Specific interfaces are better than one general-purpose interface"
"Clients should not be forced to implement interfaces they don't use"

Applying this on Sitecore templates comes down to using base templates. Use base templates, and use them wisely. Create a base template for every need and make sure that your templates do not end up with fields they do no use. 
Unused fields will confuse your content editors, it might give them the impression that they lack functionality. 


Dependency inversion

"One should depend upon Abstractions. Do not depend upon concretions. A high-level modules or class should not depend upon low-level modules or classes"

This principle is mostly implemented with Dependency Injection. As there are already lots of blog posts already on that subject (also in combination with Sitecore) I won't go into details here. And I couldn't find an example related to templates..

One thing worth mentioning however is not to think that you are done when using inversion of control. If you want you classes/function independent and testable you must also make sure they do not use Sitecore.Context. A simple (but often forgotten) one is using the "current language". Your business functions should ask for the language if they need it (a parameter) and not depend on the Sitecore.Context!


Conclusion

Wrapping this up for using templates one could say:

  • use base templates
  • use base templates wisely
  • use small base templates wisely
and try to keep your business logic code independent of Sitecore.Context and templateId's.


Friday, November 27, 2015

Preview.ResolveSite and the disabled Shared Layout button in Sitecore Experience Editor

After installing some Sitecore 8.1 sites, we noticed something weird in one of the new features in the experience editor: the final/shared layout button, which quite some editors were waiting for.

The symptoms:

The Final/Shared button is disabled.




Some more testing came up with the following results;

  • It happened on some Sitecore 8.1 installations, but not on all of them
  • It happened on all browsers, on different client machines
  • It happened only if we went from the LaunchPad to the Content Editor and then via the Publish ribbon to the Eperience Editor
  • It did not happen if we went from the LaunchPad immediately to the Experience Editor - and strangely enough once we had visited the Experience Editor this way the button was always enabled throughout the entire session

It took us quite a while and help from Sitecore Support (thanks Yuriy) to figure it out, but then they pointed us at a the Preview.ResolveSite setting. A small explanation is also found at Kirkegaard's blog.

The preview resolving settings

On a multisite solution, when one opens a page in the Experience Editor, by default, Sitecore resolves the corresponding site using the value of the Preview.DefaultSite setting from the \App_Config\Sitecore.config file. The default value there is "website".

However if one sets Preview.ResolveSite setting value to "true" (it is located in the same file and by default it is "false"), Sitecore tries to resolve the root item and the context site based on the current content language and the path to the item. If Sitecore cannot resolve the context site, it uses the site that is specified in the Preview.DefaultSite setting.

So, if Sitecore renders the page on the solution in context of the "website", the Final Layout button is disabled. Why that happens is still a mystery as the rest of the Experience Editor is working fine.

But anyway: setting the Preview.ResolveSite to true fixed the issue.

Friday, November 13, 2015

Sitecore indexes

There are already a lot of blog posts describing the use of Sitecore indexes, especially since Sitecore 7 and the introduction of the ContentSearchManager and the ease to use them.
And still.. I see lots of people writing queries going through lots of items. So: yet another post to promote the use of indexes.

Sitecore indexes, indexes, indexes!

Sitecore has some built-in indexes (since Sitecore 8 even more). The best know are probably the sitecore_master and sitecore_web indexes. Personally I never use those. I always create a custom index. Why? 
  • I don't want to mess with the indexes that Sitecore uses
  • I want my indexes small and lean
    • faster (re)build
    • easier to check

When to use?

It's hard to say exactly when to use an index, but I'll try to give some common real-life examples of request where I almost always think "index":
  • fetch all news items from year x
  • fetch all products from category x
  • fetch all events happening in the future
  • fetch the latest news items
  • get the last 3 blog posts written by x
  • ...
Too often developers write queries which are fast with the test data.. but after a while the real data is has outgrown the solution and it gets slow..  So it's better to think ahead and make more use of those indexes. In lots of cases the result will be faster than a (fast) query.

Create a custom index

As there already are a lot of examples out there, just a fast introduction on how you can create (and use) a custom index. For more information on all the possibilities, check the Sitecore docs (or the default index config files which include examples and comments).

Configuration

I usually create a separate config file where I put the index definition and configuration together. 

Example index definition:

<configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">
  <indexes hint="list:AddIndex">
    <index id="MyCustom_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
      <param desc="name">$(id)</param>
      <param desc="folder">$(id)</param>
      <!-- This initializes index property store. Id has to be set to the index id -->
      <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
      <configuration ref="contentSearch/indexConfigurations/myCustomIndexConfiguration" />
      <strategies hint="list:AddStrategy">
       <!-- NOTE: order of these is controls the execution order -->
       <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
      </strategies>
      <commitPolicyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch">
        <policies hint="list:AddCommitPolicy">
          <policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy, Sitecore.ContentSearch" />
        </policies>
      </commitPolicyExecutor>
      <locations hint="list:AddCrawler">
        <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
         <Database>web</Database>
           <Root>/sitecore/content/Corporate</Root>
        </crawler>
      </locations>
    </index>
  </indexes>
</configuration>

In this example I used the LuceneProvider, the onPublishEndAsync update strategy (info on update strategies by John West here) and refer to my custom configuration.
Note that I added a crawler for the web database and gave it a root path (can also be an ID).


Example index configuration:

<indexConfigurations>
  <myCustomIndexConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
    <indexAllFields>true</indexAllFields>
    <initializeOnAdd>true</initializeOnAdd>
    <analyzer ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/analyzer" />
    <documentBuilderType>Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder, Sitecore.ContentSearch.LuceneProvider</documentBuilderType>
    <fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
      <fieldNames hint="raw:AddFieldByFieldName">
        <field fieldName="_uniqueid" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
          <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
        </field>
        <field fieldName="__sortorder" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.Integer" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="title" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
          <analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
        </field>
        <field fieldName="date" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.DateTime" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="sequence" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.Integer" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="topics" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.Guid" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
        <field fieldName="applications" storageType="YES" indexType="UNTOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider"/>
      </fieldNames>
    </fieldMap>
    <include hint="list:IncludeTemplate">
      <NewsTemplate>{5CD362B8-C129-437A-A0D4-4EE58E71FEB1}</NewsTemplate>
      <ProductTemplate>{18D5467C-79F9-405B-AA87-2BA4B7CDB443}</ProductTemplate>
      <EventTemplate>{6CA9AC2A-1A9D-429B-870C-FC9417D3A1C7}</EventTemplate>
    </include>
    <fieldReaders ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldReaders"/>
    <indexFieldStorageValueFormatter ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexFieldStorageValueFormatter"/>
    <indexDocumentPropertyMapper ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexDocumentPropertyMapper"/>
  </myCustomIndexConfiguration>
</indexConfigurations>


Notice here:

  • the "indexAllFields" : if not true, you need to define the fields in an include (just like the IncludeTemplate, but then with includeField)
  • the "fieldMap": here we define the field options: storageType, type (can be string, int, guid, date, ...) and if needed an analyzer (all information on analyzers by Adam Conn here)
  • the IncludeTemplate section where we define the templates of the items to include in the index (the guid is important, the name is useful for understanding the config)

Standard Sitecore fields
Most of the standard Sitecore fields are included automatically. 
Some are not, but you can include them (in the example above the SortOrder field is included).


After creating the configuration you should see your index in the Index Manager in Sitecore and you can rebuild it. Check your data in the index with a tool like Luke. This way you are sure your config is good before you start using it. Luke is also handy later on to check your queries.

Computed fields

Computed fields are fields that are added to the index through custom code (the value is computed instead of just fetched from a field). Off course, computed fields can also be added to a custom index.

Querying your index

Sitecore has a class SearchResultItem that can be used to fetch results from the index, but in most cases you will want to extend this class.

Example SearchResultItem:


public class EventItem : SearchResultItem
public class EventItem : SearchResultItem
{
  [IndexField("title")]
  public string Title { get; set; }
  
  [IndexField("startdate")]
  public DateTime StartDate { get; set; }
  
  [IndexField("profile")]
  public ID Profile { get; set; }
}
We use the SearchContext to do the actual query. Example code:

private IEnumerable<EventItem> GetEventItems()
{
  var templateRestrictions = new List<ID>
  {
    new ID(applicationSettings.EventsTemplateId)
  };

  using (var context = ContentSearchManager.GetIndex("MyCustom_index").CreateSearchContext())
  {
    var templatePredicate = PredicateBuilder.False<EventItem>();
    templatePredicate = templateRestrictions.Aggregate(templatePredicate, (current, template) => current.Or(p => p.TemplateId == template));
    var datePredicate = PredicateBuilder.True<EventItem>();
    datePredicate = datePredicate.And(p => p.StartDate >= DateTime.Today);
    var predicate = PredicateBuilder.True<EventItem>();
    predicate = predicate.And(templatePredicate);
    predicate = predicate.And(datePredicate);
    predicate = predicate.And(p => p.Language == Sitecore.Context.Language.Name);
    var query = context.GetQueryable<EventItem>(new CultureExecutionContext(Sitecore.Context.Language.CultureInfo)).Where(predicate).OrderBy(p => p.StartDate);
    var queryResults = query.GetResults();
    foreach (var hit in queryResults.Hits)
    {
      if (string.IsNullOrEmpty(hit.Document.Title))
      {
        continue;
      }

      yield return hit.Document;
    }
  }
}
We use "predicates" to define our query. I find them useful to create reusable code (not shown here), especially combined with generics. Predicates are created with the PredicateBuilder (use true for "and" and false for "or" queries).

First we defined a predicate to check the templateID (from a list of possibilities). We also check a datefield and in the end we have predicate for the language.

In the example we sort (OrderBy), but the queryable has also options to use paging, facetting, ... The resultSet include a list of results, but also the facets, the total number of results (important when paging), ..   

Make sure that if you sort you are using the correct types. Sorting numbers as string will give you unexpected results..

Fetching Sitecore items
It is also important to know that the results are not yet Sitecore items - we get the items we define (our SearchResultItem's). It is however quite easy to fetch the actual Sitecore items here, also using Glass if you want. Be aware though that the part after the index is sometimes the performance bottleneck: you wouldn't be the first to lose all performance benefits from the index by fetching too many Sitecore items or writing a slow Linq query after the search.

Before fetching the real Sitecore items (or Glass-mapped-classes), consider if you really need them. In lots of cases you will, but sometimes the information from the index can be sufficient and you can save even more time not retrieving actual items.


Logs

If your query returns unexpected results a good place to start looking in the search log file. All queries that are performed are logged there and if you are using Luke you can copy/paste the query in Luke and test it. 


Issues

There are some known issues.. some unknown as well. I have a few open tickets with Sitecore support regarding indexes at the moment, so maybe more posts will follow...

Thursday, November 12, 2015

Delivering instant data to a high traffic Sitecore site

The challenge


We had to deliver data to 12 to 15.000 concurrent users on 2 Azure servers running on Sitecore (7).
Most of the data was coming from an external source (restful services) and could (should) not be cached for longer than 1 second because it was real time (sports) scoring information. That data was merged with extra information from Sitecore.

In this post I will describe what we did to achieve this, knowing that we could not "just add some servers". We had an architecture with 2 content delivery servers, 1 content management server and a database server. That could not be changed.

All things described here had some impact, some more than others or on different levels of the application but I hope it might give you some ideas when facing a similar challenge. The solution was build on Sitecore 7 with webforms and webapi.


Coding

Using context classes

Our most visited pages had up to 10 controls showing data (so not counting controls for creating grids and so). All of these controls needed a same set of data fetched from the url, page, ...  The worst solution would be to have all that logic in every control. A better (and actually faster) way could have been to put that logic in a class and use that as base class or call it from all controls.
But in order to prevent all these controls going through that process and possible doing request to back-end services we took another approach and created a "context" class, injected per request (using Autofac in our case). The "context" class was called before any control was initialized and prepared all necessary context data for the page - without having to know what controls are on the page because that would break the whole idea of Sitecore renderings.

Example
We had a "team pages". Each team had several pages with different controls on them (player list, statistics, coach info, pictures ... ) and as it should in a Sitecore environment the editors decide what controls they want on each page. But: on each team page we need to know the team and we could already fetch the global team information as this was one object in the back-end systems. Depending on the control this could cover up to 50% of the data needed, but for each control it was one less request. If you have 10 controls on a page, this matters.. (even if the data would be cached).


Caching

Some of the data from Sitecore could be cached. At least, until it was changed in Sitecore. There are lots of post already out there on how to clear your cache when a Sitecore publish is done, but then you might clear your cache too often. So we created a more granular caching system. Maybe I should write a blog post on this one alone but in a nutshell it comes down to this: each entry in the cache has a delegate that determines whether the cache should be cleared (and maybe even immediately refilled) and after Sitecore publish, for each cache entry the delegate is called. 

This way each cache entry is responsible for clearing itself. In the delegate we can check language, published item, ..  If a lot of Sitecore publishes are happening this mechanism can prevent quite some unneeded cache clearances. Of course, one badly written delegate method could kill your publish performance, but if you keep them simple (try to get out of the method as soon as possible) it's worth it.


Threads

As mentioned before we could not cache the data coming from the external back-end system for any longer than 1 second. The API to the system was build with restful services. Calling the services when we needed the data was not an option if we wanted to serve that many users. Our solution was threading. We created a first long running thread that called the back-end every 5 minutes to see whether there was data to be fetched (this timing was fine as data was starting to show at least a day before the actual live games started). When we detected live data coming in we would start a new thread that fetched the actual data constantly and kept it in memory available for the whole application (until we detected the end of the data stream and let the thread stop). With the constant loop that fetched the data, mapped it to our own business objects (Automapper to the rescue) and sometimes even performed some business logic on it, we were able to keep the "freshness" of the data always under the required 1 second.

So the data was available on our site -in memory- at all time and we the threads for the web application were not harmed as the retrieval itself had it's own threads. A monitoring system was put in place to detect the status of the running threads and we included quite some logging to know what was going on, but in the end it all went well and was stunningly fast.


WebAPI / AngularJS

To deliver the live data on the pages we used AngularJS and WebApi to change the data on the pages without extra page request. For some of the controls this also enabled us to provide at least some content immediately to the users while fetching the rest.
Other tricks like progressive loading of images managed to get the amount of data that is initially loaded by the page down to a minimum.


Tuning

Pagespeed optimizations

This is something you should consider on every site no matter what the traffic will be like. I am talking about bundling and minifying css and javascript, optimizing images (for Sitecore images, use the parameters to define width and/or height), enabling gzip compressing and so on.. Mostly quite easy tasks that can give your site that extra boost.

IIS tuning

Probably a bit less know and for many sites not necessary, but also your IIS webservice can be tuned.
There is a good article by Stuart Brierly on the topic here.
What we did in particular is change the configs to adapt to the number of processors allowing more threads and connections. When doing this you need to make sure off course that your application can handle those extra simultaneous requests. 
We also adapted the number of connections that were allowed to the external webservice (by allowing more parallel connections to the IP-address).

These changes made sure that IIS did not put connections on hold while we still had some resources left on the server.

This looks like a small change, but with quite some impact and you will need to perform some load tests to test your changes.


Infrastructure

Servers

As said in the beginning we could not add more servers, but we did upscale the servers as far as we could.

Caching

The last step to the solution was adding caching servers (Varnish) in front of the solution. This could free the webservers from a lot of request that could easily be cached. Here the use of WebApi to load some data also helped to get quite some requests cached: this can be resources like javascript files or images, but also complete pages. If you can serve these request without them going all the way to your webserver, your server has a smaller request queue and more resources to handle the remaining requests. 

This last step does not come for free, but it had a huge impact once configured properly.

Wednesday, August 5, 2015

Sitecore wildcard items to the max

Wildcard items

There are enough blogs already describing what a wildcard item in Sitecore is, so a very short intro should do here: 
A Sitecore wildcard item is a regular item in Sitecore with "*" as name. The item has all the fields of a normal item.
The item will match on any url in its tree on its level that is not matched on any of its siblings. Wildcard items can have children.
Some Sitecore versions have a bug when using a display name in the path up to a wildcard item (hotfix available).


The case

We had a case for a sports organisation with a few requirements:
  • create a website for a competition that can host a few overall pages and 'subsites' for all 20 events in the competition
  • editors do not want to create 20 trees in Sitecore
  • all data about the competitions comes live from a backend system and is not copied into Sitecore
  • the competition is yearly so the site needs to be ready to be copied next year
  • each event has around 120 matches 
  • each event can have a mens and/or a womens tournament, each with around 60 competitors

The solution

We used some wildcards..  
The first wildcard is right below the general home page, and defines a sub-homepage for the event. Underneath we have the complete set of items for each event. 


More wildcards

Even more wildcards appear. In the (wo)mens section we have a wildcard to display a team, and underneath another one to display a player. This way we can have pages (seo friendly) for all teams and players in each event.

We also have wildcards for each game (under that wildcard we have subitems for the subpages of a match).

Navigation

The navigation is in 2 parts: first of all we have navigation that displays all events. That is not Sitecore related (data comes from backend system) but will point to the wildcard item under the home item - off course indicating the right event in the url.

The navigation inside an event can be set once. The links are automatically transferred into the correct url for that event (see part about code). Some pages will not have content, but that is handled on the page itself.

Content 

Sitecore content is shown based on datasources - as it should. But here the datasource will not go to a specific item but to an item container. In that container we look for content that can be displayed for the current event. If that is not found we display a message that the content is not yet available.

The code

Events and tournaments have a unique code. That code will be used in the urls on the spot of the main wildcard. We will use the eventcode as much as possible, as soon as a selection is made for the men/women tournament, we use the tournament code. That is because we need the correct code to go to the backend system.

To accomplish this we wrote a context class (instance per request) that detects the "context" - the current event and or tournament - from the url and exposes this to everyone who needs it. 

LinkProvider


The nifty stuff is in our custom LinkProvider. We wrote some code that determines the requested tournamentcode, applying some business rules and based on the current context. In the end, we replace the first wildcard in the url with the determined code (only the first one, as we might have multiple) with string manipulation.

Other wildcards

The other wildcards (team, player, match) are handled in their business objects. Each business class has it's own logic to create the urls and grab the information needed back from the url. As all our code passes through the business objects and nothing is done straight on the base Sitecore items, we are good here. Off course, it's not possible to put a url to a team or so from within the content but that was not a requirement anyway (actually it is possible as a 'external' url).

Components

The event homepage contains several components that need to be activated based on the status of the event (not started, ongoing, finished). This is also done automatically based on the event in our context. All components are placed on the page and the ones that are not compliant will be hidden.
We could have used a custom condition here and use the standard hide functionality to create conditional rendering rules, but we kept it simple and put the logic in a base function for all homepage components.

Conclusion

We had a first version (poc) quite fast but it took us a few days and a lot of testing to get everything working as it should - especially the small and tricky parts like the multilingual aspect, canonical urls, links between tournaments, links from within event components showing data about more than one tournament...

I did not put any of the code in here as it is quite specific based on our business logic and the backend system used, but if there are questions or you want to see some code or config - just ask ;)