A Search story with Solr N-Gram part 2
Querying our index
In part 1 of this search story I described the setup we did to create a custom Solr index in Sitecore that had a few fields with the n-gram tokenizer.
A small recap: we are trying to create a search on a bunch of similar Sitecore items that uses tagging but also free text search in the title and description. We want to make sure the users always get results if possible with the most relevant on top.
In this second part I will describe how we queried that index to get what we need. We are trying to use only solr - not retrieving any data from Sitecore - as we want to be ready to move this solution out of the Sitecore environment some day. This is the reason we are not using the Sitecore search layer, but instead the SolrNet library.
Query options and basics
Let's start easy with setting some query options.
var options = new QueryOptions
{
Rows = parameters.Rows,
StartOrCursor = new StartOrCursor.Start(parameters.Start)
};
We are just setting the parameters for paging here - number of rows and the start row.var query = new List<ISolrQuery>()
{
new SolrQueryByField("_template", "bdd6ede443e889619bc01314c027b3da"),
new SolrQueryByField("_language", language),
new SolrQueryByField("_path", "5bbbd9fa6d764b01813f0cafd6f5de31")
};
We start the query by setting the desired template, language and path.We use SolrQueryInList with an IEnumerable to add the tagging parts to the query but as that is not the most relevant part here I will not go into more details. You can find all the information on querying with SolrNet in their docs on Github.
Search query
The next step and most interesting one is adding the search part to the query.
if (!string.IsNullOrEmpty(parameters.SearchTerm))
{
var searchQuery = new List<ISolrQuery>()
{
new SolrQueryByField("titlestring_s", parameters.SearchTerm),
new SolrQueryByField("descriptionstring_s", parameters.SearchTerm),
new SolrQueryByField("titlesearch_txts", parameters.SearchTerm),
new SolrQueryByField("descriptionsearch_txts", parameters.SearchTerm)
};
var search = new SolrMultipleCriteriaQuery(searchQuery, SolrMultipleCriteriaQuery.Operator.OR);
query.Add(search);
options.AddOrder(new SortOrder("score", Order.DESC));
options.ExtraParams = new Dictionary<string, string>
{
{ "defType", "edismax" },
{ "qf", "titlestring_s^9 descriptionstring_s^5 titlesearch_txts^2 descriptionsearch_txts" }
};
}
else
{
options.AddOrder(new SortOrder("__smallupdateddate_tdt", Order.DESC));
}
What are we doing here? First of all, we check if we actually have a search parameter. If we do not, we do not add any search query and keep the sorting as default - being the last update date in our case.
But what if we do have a search string? We make a new solr query that combines 4 field queries. We search in the string and ngram version of the title and description. We combine the field queries with an OR operator and add the query to the global solr query.
We then set the sorting on the score field - this is the score calculated by solr and indicating the relevancy of the result.
Last we also add extra parameters to indicate the edismax boosting we want to use. We boost the full string matches most, and also title more than description.
This delivers us the requirements we wanted:
- search in title and description
- get results as often as possible
- show exact matches first
- get the most relevant results on top
Wrap up
To wrap things up we combine everything and execute the query:
var q = new SolrMultipleCriteriaQuery(query, SolrMultipleCriteriaQuery.Operator.AND);
logger.LogDebug($"[Portal] Information center search: {solrQuerySerializer.Serialize(q)}");
var results = await solrDocuments.QueryAsync(q, options);
Next to gathering the results note that we can also use the provided serializer to log our queries for debugging.
As a final remark I do need to add that a search like this needs fine-tuning. That is tuning the size of the ngrams and also tuning the boost factors. Change the parameters (one at a time) and test until you get the results as you want them.
And that's it for this second and final part of this ngram search series. As mentioned in the first post, this information is not new and most of it can be found in several docs and posts but I though it would be a good idea to bring it all together. Enjoy your search ;)