Thursday, June 12, 2025

Solr query with n-gram

A Search story with Solr N-Gram part 2


Querying our index

In part 1 of this search story I described the setup we did to create a custom Solr index in Sitecore that had a few fields with the n-gram tokenizer. 

A small recap: we are trying to create a search on a bunch of similar Sitecore items that uses tagging but also free text search in the title and description. We want to make sure the users always get results if possible with the most relevant on top.

In this second part I will describe how we queried that index to get what we need. We are trying to use only solr - not retrieving any data from Sitecore - as we want to be ready to move this solution out of the Sitecore environment some day. This is the reason we are not using the Sitecore search layer, but instead the SolrNet library.


Query options and basics

Let's start easy with setting some query options.
var options = new QueryOptions
{
  Rows = parameters.Rows,
  StartOrCursor = new StartOrCursor.Start(parameters.Start)
};
We are just setting the parameters for paging here - number of rows and the start row.
var query = new List<ISolrQuery>()
  {
    new SolrQueryByField("_template", "bdd6ede443e889619bc01314c027b3da"),
    new SolrQueryByField("_language", language),
    new SolrQueryByField("_path", "5bbbd9fa6d764b01813f0cafd6f5de31")
  };
We start the query by setting the desired template, language and path.
We use SolrQueryInList with an IEnumerable to add the tagging parts to the query but as that is not the most relevant part here I will not go into more details. You can find all the information on querying with SolrNet in their docs on Github.


Search query

The next step and most interesting one is adding the search part to the query.
if (!string.IsNullOrEmpty(parameters.SearchTerm))
{
  var searchQuery = new List<ISolrQuery>()
  {
    new SolrQueryByField("titlestring_s", parameters.SearchTerm),
    new SolrQueryByField("descriptionstring_s", parameters.SearchTerm),
    new SolrQueryByField("titlesearch_txts", parameters.SearchTerm),
    new SolrQueryByField("descriptionsearch_txts", parameters.SearchTerm)
  };
  var search = new SolrMultipleCriteriaQuery(searchQuery, SolrMultipleCriteriaQuery.Operator.OR);
  query.Add(search);
  options.AddOrder(new SortOrder("score", Order.DESC));
  options.ExtraParams = new Dictionary<string, string>
  {
      { "defType", "edismax" },
      { "qf", "titlestring_s^9 descriptionstring_s^5 titlesearch_txts^2 descriptionsearch_txts" }
  };
}
else
{
  options.AddOrder(new SortOrder("__smallupdateddate_tdt", Order.DESC));
}

What are we doing here? First of all, we check if we actually have a search parameter. If we do not, we do not add any search query and keep the sorting as default - being the last update date in our case. 

But what if we do have a search string? We make a new solr query that combines 4 field queries. We search in the string and ngram version of the title and description. We combine the field queries with an OR operator and add the query to the global solr query. 

We then set the sorting on the score field - this is the score calculated by solr and indicating the relevancy of the result. 

Last we also add extra parameters to indicate the edismax boosting we want to use. We boost the full string matches most, and also title more than description. 

This delivers us the requirements we wanted:
  • search in title and description
  • get results as often as possible
  • show exact matches first
  • get the most relevant results on top


Wrap up

To wrap things up we combine everything and execute the query:
var q = new SolrMultipleCriteriaQuery(query, SolrMultipleCriteriaQuery.Operator.AND);
logger.LogDebug($"[Portal] Information center search: {solrQuerySerializer.Serialize(q)}");
var results = await solrDocuments.QueryAsync(q, options);
Next to gathering the results note that we can also use the provided serializer to log our queries for debugging.

As a final remark I do need to add that a search like this needs fine-tuning. That is tuning the size of the ngrams and also tuning the boost factors. Change the parameters (one at a time) and test until you get the results as you want them.

And that's it for this second and final part of this ngram search series. As mentioned in the first post, this information is not new and most of it can be found in several docs and posts but I though it would be a good idea to bring it all together. Enjoy your search ;)

Wednesday, June 4, 2025

Search with Solr n-gram in Sitecore

A Search story with Solr N-Gram 

For a customer on Sitecore XM 10.2 we have a headless site running JSS with NextJS and a very specific search request. 
One section of their content is an unstructured bunch of help related articles - like a frequently asked questions section. This content is heavily tagged and contains quite a bit of items (in a bucket). We already had an application showing this data with the option to use the tags to filter and get to the required content. But now we also had to add free text search. 

There is nothing more frustrating than finding no results, especially when looking for help - so we want to give as much relevant results as possible but of course the most relevant on top. 

Also note that we do not have a solution like Sitecore Search or Algolia at our disposal here. So we need to create something with basic Solr. 

As I gathered information from several resources and also found quite a bit of outdated information this post seemed like a good idea. I will split it in two - a first part here on how to do the solr setup and a second post on the search code itself.

Solr N-Gram

To be able to (almost) always get results, we decided to use the N-Gram tokenizer.  An n-gram tokenizer splits text into overlapping sequences of characters of a specified length. This tokenizer is useful when you want to perform partial word matching because it generates substrings (character n-grams) of the original input text.

Step 1 in the process is to create a field type in the Solr schema that will use this tokenizer. We will be using it on indexing and on querying, meaning the indexed value and the search string will be split into n-grams.

We could update the schema in Solr (manually) - but every time someone would populate the index schema our change would be gone. 

Customize index schema population 

An article on the Sitecore documentation helped us to customize the index schema population - which is exactly what we need. We took the code from https://doc.sitecore.com/xp/en/developers/latest/platform-administration-and-architecture/add-custom-fields-to-a-solr-schema.html and changed the relevant methods as such:
private IEnumerable<XElement> GetAddCustomFields()
{
  yield return CreateField("*_txts",
    "text_searchable",
    isDynamic: true,
    required: false,
    indexed: true,
    stored: true,
    multiValued: false,
    omitNorms: false,
    termOffsets: false,
    termPositions: false,
    termVectors: false);
}
So we are creating a new field "text_searchable" with an extension txts that will get indexed and stored.

private IEnumerable<XElement> GetAddCustomFieldTypes()
{
  var fieldType = CreateFieldType("text_searchable", "solr.TextField",
    new Dictionary<string, string>
    {
      { "positionIncrementGap", "100" },
      { "multiValued", "false" },
    });
  var indexAnalyzer = new XElement("indexAnalyzer");
  indexAnalyzer.Add(new XElement("tokenizer", new XElement("class", "solr.NGramTokenizerFactory"), new XElement("minGramSize", "3"), new XElement("maxGramSize", "5")));
  indexAnalyzer.Add(new XElement("filters", new XElement("class", "solr.StopFilterFactory"), new XElement("ignoreCase", "true"), new XElement("words", "stopwords.txt")));
  indexAnalyzer.Add(new XElement("filters", new XElement("class", "solr.LowerCaseFilterFactory")));
  fieldType.Add(indexAnalyzer);
  
  var queryAnalyzer = new XElement("queryAnalyzer");
  queryAnalyzer.Add(new XElement("tokenizer", new XElement("class", "solr.NGramTokenizerFactory"), new XElement("minGramSize", "3"), new XElement("maxGramSize", "5")));
  queryAnalyzer.Add(new XElement("filters", new XElement("class", "solr.StopFilterFactory"), new XElement("ignoreCase", "true"), new XElement("words", "stopwords.txt")));
  queryAnalyzer.Add(new XElement("filters", new XElement("class", "solr.SynonymFilterFactory"), new XElement("synonyms", "synonyms.txt"), new XElement("ignoreCase", "true"), new XElement("expand", "true")));
  queryAnalyzer.Add(new XElement("filters", new XElement("class", "solr.LowerCaseFilterFactory")));
  fieldType.Add(queryAnalyzer);
  yield return fieldType;
}
Here we are adding the type for text_searchable as a text field that uses the NGramTokenizerFactory. We are also setting the min and max gram size. This will determine the minimum and maximum number of characters that are used to create the fractions of your text (check the solr docs for more details). 

Don't forget to also add the factory class and the configuration patch and that's it. 

We created a custom index for this purpose in order to be able to have a custom configuration with computed fields and such specific on this index - with a limited number of items. If we now populate the schema for that index, our n-gram field type is added.

Sitecore index configuration

As mentioned earlier we have a custom index configured.  This was done for 2 reasons:
  • settings the crawlers: plural as we have two for both locations where we have items that should be included in the application
  • custom index configuration: we wanted our own index configuration to be completely free in customizing it just for this index without consequences in all the others. The default solr configuration is referenced so we don't need to copy all the basics though
    <ourcustomSolrIndexConfiguration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration">
In order to get what we need in the index, we configure:
  • AddIncludedTemplate: list the templates to be added in the index
  • AddComputedIndexField: all computed fields to be added in the index

Computed Fields

Next to a number of computed fields for the extra tagging and such, we also used computed fields to add the title and the description field two more times in the index. Why? Well, it's an easy to way to copy a field (and apply some extra logic if needed). And we do need a copy. Well, copies actually. 

The first copy will be set as a text_searchable field as we just created, the second copy will be a string field. Again, why?

As you will see in the next part of this blog where we talk about querying the data, we will use all data from the index and not go to Sitecore to fetch anything. This means we need everything we want to return in the index and that is why we are creating a string field copy of our text fields. It's all about tokenizers☺.  The text_searchable copy is to have a n-gram version as well.   

I am not going to share code for a computed field here - that has been documented enough already and a simple copy of a field is really very basic. 

Configuration

I will share the configuration parts to add the computed fields.
<fields hint="raw:AddComputedIndexField">
  <field fieldName="customtagname" type="Sitecore.XA.Foundation.Search.ComputedFields.ResolvedLinks, Sitecore.XA.Foundation.Search" returnType="stringCollection" referenceField="contenttype" contentField="title"/>
 ...
  <field fieldName="titlesearch" type="X.Index.CopyField, X" returnType="string" referenceField="title" />
  <field fieldName="descriptionsearch" type="X.Index.CopyField, X" returnType="string" referenceField="description" />
  <field fieldName="titlestring" type="X.Index.CopyField, X" returnType="string" referenceField="title" />
</fields>
  <field fieldName="descriptionstring" type="X.Index.CopyField, X" returnType="string" referenceField="description" />
</fields>  
This config will create all the computed index fields. Note that we are also using the ResolvedLinks from SXA to handle reference fields.
Adding the fields with the correct type to the field map:
<fieldMap ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration/fieldMap">
  <typeMatches hint="raw:AddTypeMatch">
    <typeMatch type="System.String" typeName="text_searchable" fieldNameFormat="{0}_txts" settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
  </typeMatches>
  <fieldNames hint="raw:AddFieldByFieldName">
    <field fieldName="titlesearch" returnType="text_searchable"/>
    <field fieldName="descriptionsearch" returnType="text_searchable"/>
  </fieldNames>
</fieldMap>  


Our index is ready now. In part 2 we will query this index to get the required results.


Thursday, May 8, 2025

SUGCON 2025 - the part with XM Cloud Content

SUGCON 2025 - XM Cloud Content

Save the best for last...  that does not only count for this blog post series, but also for Sugcon 2025 itself. Before we continue, as this is part three I also have a part one and a part two 🙂


XM Cloud Content


Alistair Deneys had the honor of presenting us XM Cloud Content. 

Let's start with "what is XM Cloud Content?".  Seems a very simple question but to be honest since the conference I have seen a few answers to this question. After the presentation one would have said this is a new CMS. But apparently some people who had heard about something like this before, seem to think it's not a new product but the base for a brand new version of XM Cloud. 

The session description on the Sugcon website tells me this:
XM Cloud Content is going to be an evolution of Content Hub ONE, Sitecore's fully managed headless CMS. Come and have a peek at this new product which is currently under development at Sitecore.
So for me at the moment we are looking at a new CMS. Which would be great as it has become very clear that XM Cloud as it is now is not suited for everyone and Content Hub One is dead. A new product to fill that gap would be tremendous news so let's just assume that is really what we saw here. 


XM Cloud Content should become the result of all the knowledge Sitecore captured over the past years about creating a CMS - the good things but certainly also the bad ones. Learn from your mistakes is a good credo, and we did see some of that already in the first basics of the newborn. 


Foundation


For SAAS products we shouldn't care what is behind the wall. If it works well it's ok. And although this architecture diagram probably doesn't mean much - it did help making a point about something that has been bothering Sitecore users for a long time. I'm talking about publishing - which will still be possible here but there will be no database transfers anymore making it a lot easier to make this finally fast. 

Probably a bit more interesting already is the domain model. 
  • Items will be defined by content types
  • Taxonomies will be used to classify items
  • Fragments can be used for re-usable content type parts, but as they can also be used in searches (and maybe security?) they become a vital part in composing a good content type structure


Seems all pretty simple and very similar to other equivalent products.


Queries

Queries are done with GraphQL and it seems we will get many options. One that was interesting is querying on fragments as that might avoid having lists of content types in your query. 


Note that the GraphQL schema is definitely not final yet (as is all the rest) and Alistair is looking for feedback on this part. 

There would also be a way to save GraphQL queries - a bit like stored procedures in a sql database. For complex queries this could save a bunch when sending the requests.

Demo 

The main part of the presentation was actually a demo - which is nice as this means something already exists and this is not just a theoretical exercise.


We did get a glimpse of a UI - which was completely in the line of what all the Sitecore products should look like these days. Clean, white, simple. But the demo was completely done with the CLI actually. 



If you can prepare scripts and json files this all goes a bit smoother of course. We saw Alistair creating the content types and taxonomy, then creating some actual content to test with and finally querying that content in several ways. 

The demo went pretty well to be honest - one would wonder what he sacrificed to the demo gods 😈 


We were also introduced to the security aspects. Those looked pretty nice - and you might think this is pretty common but there are some CMS systems out there were this is not so trivial. 

Anyway, it will be possible to restrict access via tokens based on several aspects going from the publish state and an environment to types or fragments and apparently even on saved queries.






Conclusion 


I can only say I am really looking forward to this XM Content Cloud. It looks very promising. Hopefully Sitecore can really deliver this time and can put some pricing on it that is also suitable for smaller markets.

To be continued...  maybe on Symposium?


Friday, May 2, 2025

SUGCON 2025 - part two with XM Cloud

SUGCON 2025 - the story continues with XM Cloud

Make sure to read part 1 of the Sugcon 2025 saga...

Vercel

Let's start this second part with something maybe not completely Sitecore related but very relevant to most current projects so I was glad to see Vercel present at the conference. Not only to have a nice chat with them at the booth, trying to get answers about their version and support strategy. To be honest, that is not yet clear to me but they also gave a session about optimizing Next.js - which was even for someone like me who is completely not (yet) into that Next stuff pretty interesting. 


Alex Hawley presented in a clear and very comprehensible way a few pitfalls and how to solve them. Very interesting for headless implementation of Sitecore (or even other platforms).


JSS - XM Cloud

This brings us to the next topic - proudly presented by Christian Hahn and Liz Nelson. The JSS SDK and starterkits have had a major cleanup.  Note that we are talking about the XMCloud version here. By decoupling this from the XP version a lot of became possible.



In general: they removed a lot of code, making the packages a lot smaller and faster to load.  All developers will know that this is a very pleasant step to take. It brings a fresh start and there seems to be indeed more things on the roadmap to keep on improving. 

 


It's nice to see some of the recommendations from the Vercel session coming back here in the new and improved jss, or should we say Content SDK now... 




As we are talking about XM Cloud, we cannot not mention Andy Cohen. His session was not really what I expected, but it was an eye-opener - as was his first implementation experience apparently 😐



XM Cloud - Marketplace

We are staying in the XM Cloud sphere with the session by Krassi Eneva and Justin Vogt about the marketplace Sitecore is creating for modules to extend XM Cloud. 

Well, actually they are talking about a hub to extend and customize Sitecore Products. So not limited to XM Cloud but the examples went in that direction and it does make sense. As you probably know Sitecore is heavily discouraging custom code on your XM Cloud - something that made the product in the beginning in the eyes of some not really "saas". Even though I am a developer and I always liked extending everything I do believe not putting custom code next to a product that should be SAAS is a good idea. We already had interaction points with several event webhooks. With this marketplace, we can also build real extensions into the editing UI.


 
There are a few levels in the marketplace - each will have a (slightly) different path to register your module.  A single tenant module can be used to extend the product to a use case for a single customer.  This can cover a very specific business need.  A next step is the multi-tenant module which is probably targeted at partners who want to build their own extensions and use them for multiple customers.
The public modules are available to everyone. They can be either free or paid but they must be approved by Sitecore to make sure the quality is good and the look and feel is similar to the original product. 

In order to achieve all this, there is an SDK:
 



Next

There is one more story to tell...  but that will be for part three in this years Sugcon series


SUGCON 2025 - part one and AI

SUGCON 2025 - Antwerp, Belgium

Yeah, Sugcon in Belgium. Which meant I could attend this edition.  As it is always a pleasure to meet up with people from the Sitecore community I was really looking forward to it and it did not disappoint me.

With a venue on approximately 60km away from my doorstep there was no hotel and flight for me. Instead I got a pleasant drive... up until you get a glimpse of the Antwerp skyline which also means: traffic jams 🙂


But we got safe and sound to a very sunny Antwerp and a tremendous venue with a view on the zoo. It should be no surprise that there was another traffic jam when entering the main room for the keynote as everyone was taking pictures

So venue is fine, weather is good and the crowd very nice. Now we just need some good content as well. Ready for take-off... 


Keynotes

I must admit that the keynotes were not really mind blowing nor unforgettable. Which is to be expected on an event like this - big announcements are for Symposium. Sugcon got simple, to-the-point keynotes with the usual AI buzz words and mentions of the community as one would suspect on a community event. 

I was nice to see all the familiar faces behind Dave O'Flanagan -  with Jason in plain sight but I'm also in there somewhere 🙂. 









AI

Of course AI was present in many presentations. Whether it was to generate the images in the presentation like Jason Wilkerson or as real main focus in the CCC demo from Morten Ljungberg. 
 
Morten showed us a demo of the possibilities with Sitecore Stream - the free version and the paid version with brand awareness. It looked pretty good (although his CCC/K joke was unaware of Belgian history). But the demo was nice and gave us a pretty good idea of how these things work.

Let's continue a bit more on the AI path as Vignesh Vishwanath woke us up on Friday with his talk on Stream in XP. 


He made it very clear that there are opportunities here. It also became clear that Stream is not only for the SAAS customers, but also available to those who didn't take that step (yet).  There is a free tier available that does include the basics - probably the biggest difference is the brand awareness, if you want that you will need to subscription version. 

At the moment in the CMS you can mostly generate content, but there is a roadmap to increase the possibilities and add more functionality within the product. 


One of the things that is coming would be help in translating - but with a translation company as a sponsor he also made it clear that this is AI translations and that those are not yet perfect 🙂



There were (lots) more interesting sessions of course. And also very interesting talks during the breaks. More of that in part two... 




Thursday, October 10, 2024

JWT validation in Azure Web App

 JWT validation in Azure Web App

Why

First of all some background of why we are doing this. 
In a Sitecore JSS headless project we have an API which is hosted in a Web App. So no Azure Function (we have those as well) but for a number of reasons we choose a WebApi for this part. Some of these api's return sensitive data and need to be secured. And not just secured in a standard way (subscription keys, ...) but also on user level. For some requests you need to be an admin, others can only be requested for yourself and so on.

Also important to mention is that this is a B2B project where the customer decided that the login is done with an Entra App registration in order to provide SSO experience without them having to deal with granting or revoking access - except of course in the custom application itself. As a result, we have no claims in the tokens as at that moment the user is only authenticated but not yet authorized.

As we cannot fetch the role information about a user from the JWT, in our Azure API Management (apim) we can only verify if the token is valid but not the content. 

And so, we want to validate the content in the Web App itself. I found a lot of information on the subject, but nothing was really what I needed. But after a lot of trial and error and putting various docs together I finally have something that seems to work. 

As Sitecore is becoming more and more headless and distributed and custom api's are probably a part of that it seems like a good idea to share my info.

Goal

To summarize, our goal is to verify if the request came from a specific email - specific can be the same email as in the request data or the email of an admin user. Verification is done based on a JWT that will be created by MS Entra.


Code

Startup

Let's dive straight into it and start with a function that we add in our Program class:
using Microsoft.AspNetCore.Authentication.JwtBearer;
using Microsoft.AspNetCore.Authorization;
using Microsoft.IdentityModel.Tokens;

void AddPolicies(WebApplicationBuilder builder)
{
  var tenant = builder.Configuration.GetValue<string>("TenantId");
  var client = builder.Configuration.GetValue<string>("ClientId");
  builder.Services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
    .AddJwtBearer(opt =>
    {
      opt.Authority = $"https://login.microsoftonline.com/{tenant}/v2.0"; //.well-known/openid-configuration"
      opt.TokenValidationParameters = new TokenValidationParameters
      {
        ValidateIssuerSigningKey = true,
        ValidIssuer = $"https://login.microsoftonline.com/{tenant}/v2.0",
        ValidateIssuer = true,
        ValidAudience = client,
        ValidateAudience = true,
        ValidateLifetime = true,
      };
    });

  builder.Services.AddAuthorization(options =>
  {
    options.AddPolicy("EmailPolicy", policy =>
    {
      policy.AuthenticationSchemes.Add(JwtBearerDefaults.AuthenticationScheme);
      policy.Requirements.Add(new EmailRequirement());
    });
    options.AddPolicy("AdminPolicy", policy =>
    {
      policy.AuthenticationSchemes.Add(JwtBearerDefaults.AuthenticationScheme);
      policy.Requirements.Add(new AdminRequirement());
    });
  });
}
We need a tenant and client id. With those secrets we can define the Authentication based on the JwtBearer schema and set the issuing authority along with the parameters that define what we want to validate.

Once we have the authentication in place, we can add the Authorization as that is what we are actually looking for.

For the authorization we can add a Policy. Or in our case two policies as we want to be able to verify the email and the admin status. We will add the code for those policies later.

In the main function we add a call to the AddPolicies and also mention that we want to use Authentication and Authorization.
AddPolicies(builder);

app.UseAuthentication();
app.UseAuthorization();

Policies

Creating the policies is no more than creating an empty class - as example the EmailRequirement:
using Microsoft.AspNetCore.Authorization;

public class EmailRequirement : IAuthorizationRequirement
{
}

Authorization Handler

The next step - and this is where the magic happens - is the authorization handler. You have a few options here as mentioned in the MS-docs article and we decided for one handler to handle all requirements.
using Microsoft.AspNetCore.Authorization;

public class PermissionHandler : IAuthorizationHandler
{
  ...

  public Task HandleAsync(AuthorizationHandlerContext context)
  {
    if (context.Resource is HttpContext httpContext)
    {
      var jwtMail = securityService.GetJwtMail(httpContext.Request);
      if (string.IsNullOrEmpty(jwtMail))
      {
        context.Fail();
        return Task.CompletedTask;
      }

      var pendingRequirements = context.PendingRequirements.ToList();
      foreach (var requirement in pendingRequirements)
      {
        if (requirement is EmailRequirement)
        {
          var email = GetEmail(httpContext.Request);
          if (!string.IsNullOrEmpty(email) && ...)
          {
            context.Succeed(requirement);
          }
          else
          {
            context.Fail();
          }
        }
        else if (requirement is AdminRequirement)
        {
          ...
        }
      }
    }

    return Task.CompletedTask;
  }

  private static string? GetEmail(HttpRequest request)
  {
    ...
  }
}

A PermissionHandler is a straight forward class that checks the requirements sets the context to success or failure and return the task completion. In our case we are first getting the email address from the JWT (details follow) and if we have an email we loop over the requirements. 

Those are pure business logic. As an example for the EmailRequirement we fetch the email from the request (how to do this is dependent on how that information is being send to your api's). If the email is valid (equal to the one in the JWT) we set the context to be succeeded - otherwise it failed.
We repeat this for every requirement needed.

Getting the email from the JWT

using System.IdentityModel.Tokens.Jwt;

public class SecurityService : ISecurityService
{
  ...

  internal static string? GetJwt(HttpRequest request)
  {
    try
    {
      var jwt = request.Headers.Authorization;
      if (jwt.Count == 1)
      {
        var jwtSplit = jwt.ToString().Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
        return jwtSplit.Last();
      }
    }
    catch
    {
      return null;
    }

    return null;
  }

  public string? GetJwtMail(HttpRequest request)
  {
    try
    {
      var jwt = GetJwt(request);
      var handler = new JwtSecurityTokenHandler();
      var token = handler.ReadJwtToken(jwt);
      var claim = token.Claims.FirstOrDefault(c => c.Type.Equals("email", StringComparison.OrdinalIgnoreCase));
      if (claim != null)
      {
        return claim.Value;
      }
    }
    catch
    {
      return null;
    }

    return null;
  }
}

The JW token is in the Authorization header. We split that string as we don't want the "Bearer" part that precedes the token. We parse the token and get the email from the claims.

Don't forget to register Permissionhandler in your main class:
builder.Services.AddScoped<IAuthorizationHandler, PermissionHandler>();

That wraps up all the code that we need to write. Now we can start using the Authorization attribute on our api's. On all task in your controller you can add the attribute with the name of the requirement that is needed:
[Authorize(Policy = "EmailPolicy")]

And that's all folks. 

I had to get quite a few blogpost and documentation pages together to get this working and I learned that any change in the architecture or setup change the way you should do this, so I hope this explanation will be useful to someone. And if not completely what you need, it might get you in the right direction.

Friday, October 4, 2024

Introducing headless in a Sitecore MVC environment

 An old Sitecore site

A while ago we inherited a customer that had been with Sitecore for many years. They had 2 sites - a main corporate one and a smaller dedicated one - running on an old Sitecore version. The maintenance was horrible, deploying a nightmare and therefor costs were going through the roof. 

Sounds familiar? As a Sitecore developer I bet it does, as this will definitely not be the only solution like that out there.

After a cleanup and some upgrading we had at least again a stable environment and together with the customer decided to re-do the smaller site. A bit as a test to see what the real and current capabilities of the platform are.  A headless architecture was designed for this site - so the result would be to have one (old) mvc site and a headless site in the same Sitecore instance.  Sounds like a great way to introduce customers to headless architectures and what this all means for them without the need for an immediate big bang. 

I know there are already options to "upgrade" or transfer your site to headless setups, but even that approach can cost quite a bit of money and my idea has been that if you do this kind of investment and architectural change you might as well re-evaluate what your site is all about. 

First steps to headless

We introduced the customer to headless. While talking to them about the functionality and design - as this is indeed a moment to really think about those - we also informed them about the changes to hosting and deployment and what this means for them. The fact that we can deploy parts of the solution completely independent is a big bonus and maintenance in general is much more flexible now. 

This all sounds great, but how to start practically - and will this all work?

Practical steps

Our first step was to install SXA on our test environment to see if the other (mvc) site still works when that is installed. In the early days of sxa that used to be a problem (or at least was very likely to) but now we didn't detect any issues. So we could move forward and installed JSS. This will not interfere with the other sites as they are not using the headless services. The main challenge here was to get the correct version of every component needed here (ok, that could be our own fault...).

We use NextJS for the front end application, and I remember we had to do our setup of this part a few times because some things didn't work and in the end it seemed that somehow somewhere we did take a wrong version of something.. so be cautious here. But once the starter template was there, the team was ready to start.

How it is going

Very well, thank you. The headless site is live, and the customer is starting to see the benefits of the new abilities. We can deliver faster, deploy easier, and the editors are still working in the Sitecore environment they are used to. Although, not sure if that last part is a benefit as those editors might be a bit outdated compared to what is on the market today. 

Conclusion

We took a first step with this customer without too much hassle. We do hope this is a first step towards more of course - where the end goal is a total move towards a completely headless cloud based solution. But when that (big) step is too much too take at once, there are solutions. Plural - as there are a few ways to tackle it, this just being one of them.  But with an open mind and clear an honest communication with a client, anything can be done.