Thursday, October 10, 2024

JWT validation in Azure Web App

 JWT validation in Azure Web App

Why

First of all some background of why we are doing this. 
In a Sitecore JSS headless project we have an API which is hosted in a Web App. So no Azure Function (we have those as well) but for a number of reasons we choose a WebApi for this part. Some of these api's return sensitive data and need to be secured. And not just secured in a standard way (subscription keys, ...) but also on user level. For some requests you need to be an admin, others can only be requested for yourself and so on.

Also important to mention is that this is a B2B project where the customer decided that the login is done with an Entra App registration in order to provide SSO experience without them having to deal with granting or revoking access - except of course in the custom application itself. As a result, we have no claims in the tokens as at that moment the user is only authenticated but not yet authorized.

As we cannot fetch the role information about a user from the JWT, in our Azure API Management (apim) we can only verify if the token is valid but not the content. 

And so, we want to validate the content in the Web App itself. I found a lot of information on the subject, but nothing was really what I needed. But after a lot of trial and error and putting various docs together I finally have something that seems to work. 

As Sitecore is becoming more and more headless and distributed and custom api's are probably a part of that it seems like a good idea to share my info.

Goal

To summarize, our goal is to verify if the request came from a specific email - specific can be the same email as in the request data or the email of an admin user. Verification is done based on a JWT that will be created by MS Entra.


Code

Startup

Let's dive straight into it and start with a function that we add in our Program class:
using Microsoft.AspNetCore.Authentication.JwtBearer;
using Microsoft.AspNetCore.Authorization;
using Microsoft.IdentityModel.Tokens;

void AddPolicies(WebApplicationBuilder builder)
{
  var tenant = builder.Configuration.GetValue<string>("TenantId");
  var client = builder.Configuration.GetValue<string>("ClientId");
  builder.Services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
    .AddJwtBearer(opt =>
    {
      opt.Authority = $"https://login.microsoftonline.com/{tenant}/v2.0"; //.well-known/openid-configuration"
      opt.TokenValidationParameters = new TokenValidationParameters
      {
        ValidateIssuerSigningKey = true,
        ValidIssuer = $"https://login.microsoftonline.com/{tenant}/v2.0",
        ValidateIssuer = true,
        ValidAudience = client,
        ValidateAudience = true,
        ValidateLifetime = true,
      };
    });

  builder.Services.AddAuthorization(options =>
  {
    options.AddPolicy("EmailPolicy", policy =>
    {
      policy.AuthenticationSchemes.Add(JwtBearerDefaults.AuthenticationScheme);
      policy.Requirements.Add(new EmailRequirement());
    });
    options.AddPolicy("AdminPolicy", policy =>
    {
      policy.AuthenticationSchemes.Add(JwtBearerDefaults.AuthenticationScheme);
      policy.Requirements.Add(new AdminRequirement());
    });
  });
}
We need a tenant and client id. With those secrets we can define the Authentication based on the JwtBearer schema and set the issuing authority along with the parameters that define what we want to validate.

Once we have the authentication in place, we can add the Authorization as that is what we are actually looking for.

For the authorization we can add a Policy. Or in our case two policies as we want to be able to verify the email and the admin status. We will add the code for those policies later.

In the main function we add a call to the AddPolicies and also mention that we want to use Authentication and Authorization.
AddPolicies(builder);

app.UseAuthentication();
app.UseAuthorization();

Policies

Creating the policies is no more than creating an empty class - as example the EmailRequirement:
using Microsoft.AspNetCore.Authorization;

public class EmailRequirement : IAuthorizationRequirement
{
}

Authorization Handler

The next step - and this is where the magic happens - is the authorization handler. You have a few options here as mentioned in the MS-docs article and we decided for one handler to handle all requirements.
using Microsoft.AspNetCore.Authorization;

public class PermissionHandler : IAuthorizationHandler
{
  ...

  public Task HandleAsync(AuthorizationHandlerContext context)
  {
    if (context.Resource is HttpContext httpContext)
    {
      var jwtMail = securityService.GetJwtMail(httpContext.Request);
      if (string.IsNullOrEmpty(jwtMail))
      {
        context.Fail();
        return Task.CompletedTask;
      }

      var pendingRequirements = context.PendingRequirements.ToList();
      foreach (var requirement in pendingRequirements)
      {
        if (requirement is EmailRequirement)
        {
          var email = GetEmail(httpContext.Request);
          if (!string.IsNullOrEmpty(email) && ...)
          {
            context.Succeed(requirement);
          }
          else
          {
            context.Fail();
          }
        }
        else if (requirement is AdminRequirement)
        {
          ...
        }
      }
    }

    return Task.CompletedTask;
  }

  private static string? GetEmail(HttpRequest request)
  {
    ...
  }
}

A PermissionHandler is a straight forward class that checks the requirements sets the context to success or failure and return the task completion. In our case we are first getting the email address from the JWT (details follow) and if we have an email we loop over the requirements. 

Those are pure business logic. As an example for the EmailRequirement we fetch the email from the request (how to do this is dependent on how that information is being send to your api's). If the email is valid (equal to the one in the JWT) we set the context to be succeeded - otherwise it failed.
We repeat this for every requirement needed.

Getting the email from the JWT

using System.IdentityModel.Tokens.Jwt;

public class SecurityService : ISecurityService
{
  ...

  internal static string? GetJwt(HttpRequest request)
  {
    try
    {
      var jwt = request.Headers.Authorization;
      if (jwt.Count == 1)
      {
        var jwtSplit = jwt.ToString().Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
        return jwtSplit.Last();
      }
    }
    catch
    {
      return null;
    }

    return null;
  }

  public string? GetJwtMail(HttpRequest request)
  {
    try
    {
      var jwt = GetJwt(request);
      var handler = new JwtSecurityTokenHandler();
      var token = handler.ReadJwtToken(jwt);
      var claim = token.Claims.FirstOrDefault(c => c.Type.Equals("email", StringComparison.OrdinalIgnoreCase));
      if (claim != null)
      {
        return claim.Value;
      }
    }
    catch
    {
      return null;
    }

    return null;
  }
}

The JW token is in the Authorization header. We split that string as we don't want the "Bearer" part that precedes the token. We parse the token and get the email from the claims.

Don't forget to register Permissionhandler in your main class:
builder.Services.AddScoped<IAuthorizationHandler, PermissionHandler>();

That wraps up all the code that we need to write. Now we can start using the Authorization attribute on our api's. On all task in your controller you can add the attribute with the name of the requirement that is needed:
[Authorize(Policy = "EmailPolicy")]

And that's all folks. 

I had to get quite a few blogpost and documentation pages together to get this working and I learned that any change in the architecture or setup change the way you should do this, so I hope this explanation will be useful to someone. And if not completely what you need, it might get you in the right direction.

Friday, October 4, 2024

Introducing headless in a Sitecore MVC environment

 An old Sitecore site

A while ago we inherited a customer that had been with Sitecore for many years. They had 2 sites - a main corporate one and a smaller dedicated one - running on an old Sitecore version. The maintenance was horrible, deploying a nightmare and therefor costs were going through the roof. 

Sounds familiar? As a Sitecore developer I bet it does, as this will definitely not be the only solution like that out there.

After a cleanup and some upgrading we had at least again a stable environment and together with the customer decided to re-do the smaller site. A bit as a test to see what the real and current capabilities of the platform are.  A headless architecture was designed for this site - so the result would be to have one (old) mvc site and a headless site in the same Sitecore instance.  Sounds like a great way to introduce customers to headless architectures and what this all means for them without the need for an immediate big bang. 

I know there are already options to "upgrade" or transfer your site to headless setups, but even that approach can cost quite a bit of money and my idea has been that if you do this kind of investment and architectural change you might as well re-evaluate what your site is all about. 

First steps to headless

We introduced the customer to headless. While talking to them about the functionality and design - as this is indeed a moment to really think about those - we also informed them about the changes to hosting and deployment and what this means for them. The fact that we can deploy parts of the solution completely independent is a big bonus and maintenance in general is much more flexible now. 

This all sounds great, but how to start practically - and will this all work?

Practical steps

Our first step was to install SXA on our test environment to see if the other (mvc) site still works when that is installed. In the early days of sxa that used to be a problem (or at least was very likely to) but now we didn't detect any issues. So we could move forward and installed JSS. This will not interfere with the other sites as they are not using the headless services. The main challenge here was to get the correct version of every component needed here (ok, that could be our own fault...).

We use NextJS for the front end application, and I remember we had to do our setup of this part a few times because some things didn't work and in the end it seemed that somehow somewhere we did take a wrong version of something.. so be cautious here. But once the starter template was there, the team was ready to start.

How it is going

Very well, thank you. The headless site is live, and the customer is starting to see the benefits of the new abilities. We can deliver faster, deploy easier, and the editors are still working in the Sitecore environment they are used to. Although, not sure if that last part is a benefit as those editors might be a bit outdated compared to what is on the market today. 

Conclusion

We took a first step with this customer without too much hassle. We do hope this is a first step towards more of course - where the end goal is a total move towards a completely headless cloud based solution. But when that (big) step is too much too take at once, there are solutions. Plural - as there are a few ways to tackle it, this just being one of them.  But with an open mind and clear an honest communication with a client, anything can be done.
 

Friday, September 13, 2024

Sitecore JSS hiding data in props

 Sitecore JSS props

A small extension to my post on Sitecore JSS and extranet security which I wrote in July. If you haven't read that one, please do so first as I will try not to repeat too much here.

What is important to mention is that we are sending roles that are allowed to see the pages along with the page data. These roles are selected in a multiselect field.  This way the front application is in charge of the security as it should. 

We noticed a problem however when we looked at the source code of the generated pages. They were listing our role id's and names. This is of course information that we would rather not show to everyone. 

Hiding page data

So our problem is data from the page that we would like to use, but not show to users.  In our case it was security related, but that could be any data of course.

We could try to change the output and adapt the way a MultilistField sends its output. I would assume the GetMultilistFieldSerializer or MultilistFieldTypeFactory might be a good start, but actually I don't want to change that behavior just for 1 field. It seems like a messy solution, especially if we would get more data like this. As it is the front app that is putting this data in it's output, it is that app's responsibility to keep it hidden.

A MultilistField will by default show the id, the url, the name, the display name and the fields of the selected items. But even if we would just have the raw value -which is the minimal that we need for our functionality- we should hide that from public eyes.

I want to mention again that I am nowhere near a NextJs expert so for me this is gibberish but my front-end companion came up with a function that is now used to filter out the "roles" field:
import { SitecorePageProps } from 'lib/page-props';

export const whiteListProps = async (props: SitecorePageProps) => {
  delete props.layoutData.sitecore?.route?.fields?.roles;
  return props;
};
In the [[...path]].tsx this function is used on the props.


As a non-expert on the matter, I was surprised to see that all data coming from Sitecore was visible in a browser. Assuming there are more non-experts out there, I hope this post makes some sense and might keep some sensitive data hidden.

Thursday, September 12, 2024

Upgrade Azure App Service Deployment task to v4

Task 'Azure App Service deploy' version 3 (AzureRmWebAppDeployment@3) is deprecated

Azure release pipeline

Yes, this is not a post about the Sitecore platform as such but as it is related to one of my Sitecore projects it might be related to (many) more. For a Sitecore website deployed in Azure PAAS we are using Azure DevOps to deploy and we recently noticed in the release pipeline that some stages gave warnings - although they still succeeded. 



The warnings said the Azure App Service deploy task is deprecated and we should use the newer version:

Task 'Azure App Service deploy' version 3 (AzureRmWebAppDeployment@3) is deprecated
The AzureRmWebAppDeployment@3 task is deprecated, please use a newer version of the AzureRmWebAppDeploy...


So I checked the release pipeline and decided to "upgrade" the related tasks called AzureRmWebAppDeployment from v3 to v4. It looked pretty simple - just a few minor changes in the configuration and we were set to go. 

As a test I created a new release and the deployment steps seemed to work but our Unicorn sync failed with a disturbing message: the path /unicorn.aspx was not found. When we checked the files on the server with the App Service Editor we noticed the file wasn't there.. actually, almost all files were gone. 

Not knowing what had happened I redeployed the stage to see if it was some temporary issue. The second attempt however failed rather quickly: An error was encountered when processing operation 'Create File' on 'C:\home\site\wwwroot\ApplicationInsights.config. Was something really wrong with this v4? I reversed the task back to v3 and tried again, but the same error appeared. So it was not the deploy task but probably the Azure instance itself. 

Next attempt was to try and create a config file manually with the App Service Editor. Strangely that failed as well (error 409). Clearly the Web App was not behaving as expected but didn't show any errors, nor issues in the resource health. It was when I checked the Advance Tools that I finally started to get an idea of the issue. 

When opening Kudu I got this warning:

So I start searching what this WEBSITE_RUN_FROM_PACKAGE means. I noticed that this environment variable was indeed created. When I delete it, I can create files again. At that point it was time to read the documentation 🙂

I found the readme files for both version 3 and version 4 of the Azure App Service Deployment task and especially that last one was interesting as it did explain about this RunFromPackage.

RunFromPackage

Creates the same deployment package as Zip Deploy. However, instead of deploying files to the wwwroot folder, the entire package is mounted by the Functions runtime. With this option, files in the wwwroot folder become read-only.

Apparently, by default (when 'Select deployment method' is not checked) the task tries to select the appropriate deployment technology given the input package, app service type and agent OS. Maybe it was because we deploy zip files (especially for serialization files this is much faster) but Azure clearly thought it had to use the RunFromPackage. This created the environment variable causing our Web App to set the wwwroot read-only.  Problem found!

To fix it, we had to set the deployment method specifically for all the deploy steps in all stages.  



 Once we removed the WEBSITE_RUN_FROM_PACKAGE environment variable again and deployed with the deployment method set everywhere all was working fine and we have no more warnings.



Friday, July 5, 2024

Sitecore JSS extranet security

 Extranet with Sitecore JSS and NextJS


Creating an extranet in a headless setup isn't always that straight forward. It is not up to your content management system (in this case Sitecore) anymore to deliver pages and as such also not to check who is actually demanding which content.  The Sitecore security settings as we used in the XP days are not sufficient here.

In this case we also had a few particular requirements and architectural choices:
  • users are not stored through Sitecore, but in a custom database
  • roles are maintained in Sitecore (as content items)
  • pages can be visible for people from one or more roles, or even for people without a role (not yet registered guests)
  • components can have security settings that define which role(s) can view the component
  • API calls are also using the roles as security guidelines to determine who can get which data

Let's focus on the parts that get us in touch with Sitecore. So we need a solution that can set security on pages and components for roles defined and known in Sitecore. 

Pages

We created a base template with 2 fields:
  • a multiselect field to select one or multiple roles
  • a checkbox field to set the page open for everyone - including guest without a role
As we do not want to close pages for everyone (would be silly - one could unpublish the page if you want that) we also count no selected roles as all roles. 

This base template is added to all page templates. As this data is part of the page data, it is included when a page is requested by the NextJs app and can be used by that app to handle the security appropriately - send a 403 page if the page is not available for the current user.

I'm afraid I cannot share code in this article so you'll have to do with the ideas. But if you are familiar with NextJs development (which I am not btw) I would assume that this is not rocket science.

A small tip however: do not forget to check editing mode, as you do not want the security rules to be adapted when editing. 

Components

For the components, we used a similar setup. We created another base template that can be included in a component - it will not be included in all components, just in those where we want the security functionality.  This creates also a similar setup in the front app as that will receive the security data and display the components based on what it knows about the current user. 

Again, don't forget to skip the security check when editing. Although it can look weird when you have multiple components targeted at different roles, you still want to be able to edit everything.


Options

I assume there would be other options to tackle our request. It feels a lot like personalization so we also thought in that direction, but in the end our solution seemed like a simple one that would do as expected. And it is - the implementation was fairly easy and fast and the editors find their way when editing the pages and components. 

As I could only share the conceptual ideas here I hope you found it useful. And if you have any other thoughts or ideas on how to handle such requirements, I would be happy to discuss those. Always eager to learn ;) 


Thursday, May 23, 2024

Sitecore JSS Dictionary performance

 Sitecore JSS Dictionary performance tuning


We implemented a headless site with NextJS and a Sitecore 10.2 with SXA - headless SXA. The code generated by Sitecore when creating the project included all we needed to work with the SXA dictionary in a dictionary-service-factory.ts. In combination with the i18n setup this allows us to easily create and use translatable labels in our application. It works fine. 

But.. would someone please take a look at the logs...

Always watch the search logs

After a while we checked the logs to see if anything fishy was found - always a good practice, even during development cycles and when you think everything is smooth. And we noticed a lot of identical queries:

INFO  Solr Query - ?q=(((path6d119245ea284770a401ee68458abfc5) AND _language"en")) AND _templates6d1cd89719364a3aa511289a94c2a7b1)) AND _path0de95ae441ab4d019eb067441b7c2450)) AND _val:_boost&start=30&rows=10&fl=*,score&fq=_indexname:(sitecore_web_index)&wt=xml&sort=_smallcreateddate_tdt desc,_group asc

It didn't ring a bell immediately but when we executed this query in the solr admin it was very clear that this was related to the dictionary. Apparently our next app was fetching the dictionary data quite often. And in small pieces. The combination of both settings leads to a lot of queries. Even though they are probably very fast, I still thought it could be improved as the labels in our dictionary will not change frequently. 


Tweaking the DictionaryService

In the dictionary-service-factory.ts file, check for the GraphQLDictionaryService instance creation (we are using the GraphQL version):
new GraphQLDictionaryService({
      endpoint: config.graphQLEndpoint,
      apiKey: config.sitecoreApiKey,
      siteName: config.jssAppName,
      ...

Looking at the options available for this service, we found we could set a pageSize and a cacheTimeout. There is also a parameter cacheEnabled but that is true by default.  Note that the cacheTimeout has a default of 60 (seconds) and the pageSize has a default of 10 (as seen in the rows parameter in the solr query). 

For our implementation and with the number of dictionary items we have that is inefficient. We need a larger pageSize and a larger timeout. 

  new GraphQLDictionaryService({
          endpoint: config.graphQLEndpoint,
          apiKey: config.sitecoreApiKey,
          siteName: config.jssAppName,
          rootItemId: '....',
          pageSize: 150,
          cacheEnabled: true,
          cacheTimeout: Number(process.env.DICTIONARY_CACHE_TIMEOUT),

With this adjustments our dictionary was still working fine - but we had a lot less queries to solr. Note that we decided to get the timeout from an environment variable so we could tweak that between environments. You could do the same for the page if you want.


Enjoy the tips and happy dictionaring 🙂 

Monday, February 26, 2024

How to kill your Sitecore editor with one slash

 Is your page opening really slow in the Sitecore Content Editor?

I recently had an issue where one page (the homepage) of a site opened really slow in the Sitecore Content Editor. All other pages went very smooth, so it was clearly something particular to that page. And when I mean slow, I mean you click on the item and it takes one coffee for the editor to react. On the published site, there was no issue.

At first this was just annoying, but after a while you curious and annoyed and you want this fixed.

Debugging

The editor is pure Sitecore coding so how to start... one way is to enable the timing level for all events.
You will find this setting in the Sitecore.config file and if this is just for temporary debugging on a local instance you can alter it there. 

<!-- EVENT MAPS
      events.timingLevel =
        none   - No timing information is logged for any of the events (no matter what their local settings are)
        low    - Start/end timing is logged for events with handlers. Local settings override.
        medium - Start/end timing is logged for all events. Local settings override.
        high   - Start/end timing is logged for all events. Also, start/end for each handler is logged. Local settings override.
        custom - Only local settings apply. Events without settings are not logged.
    -->
  <events timingLevel="custom">
So we will change the timingLevel for all events to "high" instead of custom and restart the site.

Now let's keep an eye on the logs while we go back to the editor and open our homepage item. And bingo.. we get a lot of information in the logs but what was really interesting were the lines that said "Long running operation".

Long running operation

20204 11:29:46 DEBUG Long running operation: renderContentEditor pipeline[id={98E57B60-071A-44D1-A763-B6C1BCE0C630}]
20204 11:29:48 DEBUG Long running operation: getLookupSourceItems pipeline[item=/sitecore/templates/Feature/.../CtaRenderingParameters/__Standard Values, source=query:/$site/Data/.../*]
20204 11:29:50 DEBUG Long running operation: getLookupSourceItems pipeline[item=/sitecore/templates/Feature/.../CtaRenderingParameters/__Standard Values, source=query:/$site/Data/.../*]
20204 .....
20204 11:30:19 DEBUG Long running operation: getLookupSourceItems pipeline[item=/sitecore/templates/Feature/.../CtaRenderingParameters/__Standard Values, source=query:/$site/Data/.../*]
20204 11:30:19 DEBUG Long running operation: Running Validation Rules

Apparently we have a long running operation getLookupSourceItems and we are not doing this once, but a dozen times on the home page.  This is our cause - and ok, I exaggerated when I said a coffee but even a minute feels long to open an item.

So, we now now that the CtaRenderingParameters are doing something very slow and it is caused by the source for a field.

In that template I found (even two) droplist fields that use the same source: 
query:/$site/Data/Link Types/*

I am not writing Sitecore queries every day so I admit it took me a few minutes to figure out what was wrong with this thing. It looks like a normal query that fetches all items in a certain path. And it's just a small detail, but a significant one: the first slash should not be there 

query:$site/Data/Link Types/*

This works much smoother and the issue was fixed.


$site

We are using a resolveToken in the source. In the Sitecore docs you will find the list of available tokens and we are using $site here to get the path to the current site. 

But that path will already start with a slash. So although you (probably) want your query to start with a slash, you do not want it to start with two. As described in the query axes, one slash will give you the children (and will form one path) but two will give all descendants. And that is in most cases a very costly operation.


Conclusion

Be careful when putting sources - especially when using tokens. You might not notice anything bad or slow when just one component is added, but it can add up to a really slow item. Use the events timing level to get debugging information.