Delivering instant data to a high traffic Sitecore site

The challenge

We had to deliver data to 12 to 15.000 concurrent users on 2 Azure servers running on Sitecore (7).
Most of the data was coming from an external source (restful services) and could (should) not be cached for longer than 1 second because it was real time (sports) scoring information. That data was merged with extra information from Sitecore.

In this post I will describe what we did to achieve this, knowing that we could not "just add some servers". We had an architecture with 2 content delivery servers, 1 content management server and a database server. That could not be changed.

All things described here had some impact, some more than others or on different levels of the application but I hope it might give you some ideas when facing a similar challenge. The solution was build on Sitecore 7 with webforms and webapi.

Coding

Using context classes

Our most visited pages had up to 10 controls showing data (so not counting controls for creating grids and so). All of these controls needed a same set of data fetched from the url, page, ... The worst solution would be to have all that logic in every control. A better (and actually faster) way could have been to put that logic in a class and use that as base class or call it from all controls.

But in order to prevent all these controls going through that process and possible doing request to back-end services we took another approach and created a "context" class, injected per request (using Autofac in our case). The "context" class was called before any control was initialized and prepared all necessary context data for the page - without having to know what controls are on the page because that would break the whole idea of Sitecore renderings.

Example

We had a "team pages". Each team had several pages with different controls on them (player list, statistics, coach info, pictures ... ) and as it should in a Sitecore environment the editors decide what controls they want on each page. But: on each team page we need to know the team and we could already fetch the global team information as this was one object in the back-end systems. Depending on the control this could cover up to 50% of the data needed, but for each control it was one less request. If you have 10 controls on a page, this matters.. (even if the data would be cached).

Caching

Some of the data from Sitecore could be cached. At least, until it was changed in Sitecore. There are lots of post already out there on how to clear your cache when a Sitecore publish is done, but then you might clear your cache too often. So we created a more granular caching system. Maybe I should write a blog post on this one alone but in a nutshell it comes down to this: each entry in the cache has a delegate that determines whether the cache should be cleared (and maybe even immediately refilled) and after Sitecore publish, for each cache entry the delegate is called.

This way each cache entry is responsible for clearing itself. In the delegate we can check language, published item, .. If a lot of Sitecore publishes are happening this mechanism can prevent quite some unneeded cache clearances. Of course, one badly written delegate method could kill your publish performance, but if you keep them simple (try to get out of the method as soon as possible) it's worth it.

Threads

As mentioned before we could not cache the data coming from the external back-end system for any longer than 1 second. The API to the system was build with restful services. Calling the services when we needed the data was not an option if we wanted to serve that many users. Our solution was threading. We created a first long running thread that called the back-end every 5 minutes to see whether there was data to be fetched (this timing was fine as data was starting to show at least a day before the actual live games started). When we detected live data coming in we would start a new thread that fetched the actual data constantly and kept it in memory available for the whole application (until we detected the end of the data stream and let the thread stop). With the constant loop that fetched the data, mapped it to our own business objects (Automapper to the rescue) and sometimes even performed some business logic on it, we were able to keep the "freshness" of the data always under the required 1 second.

So the data was available on our site -in memory- at all time and we the threads for the web application were not harmed as the retrieval itself had it's own threads. A monitoring system was put in place to detect the status of the running threads and we included quite some logging to know what was going on, but in the end it all went well and was stunningly fast.

WebAPI / AngularJS

To deliver the live data on the pages we used AngularJS and WebApi to change the data on the pages without extra page request. For some of the controls this also enabled us to provide at least some content immediately to the users while fetching the rest.
Other tricks like progressive loading of images managed to get the amount of data that is initially loaded by the page down to a minimum.

Tuning

Pagespeed optimizations

This is something you should consider on every site no matter what the traffic will be like. I am talking about bundling and minifying css and javascript, optimizing images (for Sitecore images, use the parameters to define width and/or height), enabling gzip compressing and so on.. Mostly quite easy tasks that can give your site that extra boost.

IIS tuning

Probably a bit less know and for many sites not necessary, but also your IIS webservice can be tuned.

There is a good article by Stuart Brierly on the topic here.

What we did in particular is change the configs to adapt to the number of processors allowing more threads and connections. When doing this you need to make sure off course that your application can handle those extra simultaneous requests.

We also adapted the number of connections that were allowed to the external webservice (by allowing more parallel connections to the IP-address).

These changes made sure that IIS did not put connections on hold while we still had some resources left on the server.

This looks like a small change, but with quite some impact and you will need to perform some load tests to test your changes.

Infrastructure

Servers

As said in the beginning we could not add more servers, but we did upscale the servers as far as we could.

Caching

The last step to the solution was adding caching servers (Varnish) in front of the solution. This could free the webservers from a lot of request that could easily be cached. Here the use of WebApi to load some data also helped to get quite some requests cached: this can be resources like javascript files or images, but also complete pages. If you can serve these request without them going all the way to your webserver, your server has a smaller request queue and more resources to handle the remaining requests.

This last step does not come for free, but it had a huge impact once configured properly.

Miauw - a Sitecore blog

Thursday, November 12, 2015