Migrating symfony.com Search Engine to Meilisearch
March 30, 2023 • Published by Javier Eguiluz
Searching Symfony documentation is one of the most important features of
symfony.com. Search is so pervasive for our visitors, that all pages include a
keyboard shortcut to open the search input: Ctrl.
+ K
(or Command
+ K
).
We also support pressing Shift
key twice, similar to the search feature of some IDEs.
Since this website was launched, we've used different products and services to solve the search problem: Apache Solr, Elasticsearch, Google Custom Search Engine, and Algolia Search. This week we unveiled the new search experience based on Meilisearch.
Migration Reasoning
The first main reason to migrate is that we needed a better control of the parsing and indexing of contents. Using a fully hosted service is convenient, but the configuration via JSON files was a bit cumbersome and doesn't give us the full flexibility that we needed.
The second main reason is that the quality of the search results wasn't as high as our readers expected. We received complaints about confusing search results and about completely missing results. Because of the lack of control over the parsing and indexing mentioned earlier, we couldn't improve these results.
Migrating to Meilisearch
We started looking for alternative search engines. Our preference was for open source projects and developed in Go (because we have experience in that language). Although there are many projects that match those requirements, many of them lacked active development, or a big enough community or missed small features that are essential for us, like support for synonyms (more on that later).
After broadening our requirements, we looked at one project that met most of our preferences: Meilisearch. It's a startup created in 2018 that has taken the search engine market by storm. Meilisearch provides an open source search engine developed in Rust and also a hosted search solution.
Using Meilisearch has been a delight so far. It all starts with its quick and simple installation process, it follows with its great and thorough documentation and it ends with a powerful search engine with mind-blowing performance. Meilisearch feels fun, fresh and uncomplicated.
Technical Integration
Meilisearch provides integrations with lots of programming languages and frameworks. There is a Meilisearch Symfony bundle, but we don't use it because, as some folks on Symfony Slack advised to us, it's mainly focused on indexing Doctrine entities/documents.
Symfony documentation is built with the symfony-tools/docs-builder, which parses RST documents and outputs JSON files with a certain structure (title, body, TOC, pagination, etc.) That's why we used instead the Meilisearch PHP integration with the Symfony HttpClient component.
Parsing Symfony Documentation
In Meilisearch, Documents are a core concept that refers to each of the items stored in an index. Each Document contains one or more fields, each of them consisting of a key-value pair of arbitrary information. For Symfony Docs, this is what a Meilisearch Document looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
namespace App\Search\Dto;
class DocumentDto
{
private int $level = -1;
private string $title = '';
private string $content = '';
private string $version = '';
private string $url = '';
public static function createFromSymfonyDocument(...): self
{
// ...
}
}
For each Symfony Docs version, we find the JSON files created by the doc-builder
tool and parse their contents to create the Meilisearch Document. First, we remove
some elements from the generated HTML docs to improve results (such as "version added"
directives and most code blocks).
Then, we split the entire doc page by section: each <h1>
, <h2>
, <h3>
,
etc. creates a new standalone Document. For example, consider the following
simplified doc page:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
<div class="section">
<h1>First title</h1>
<p>Some content</p>
<div class="section">
<h2>Second title</h2>
<p>More content</p>
</div>
<div class="section">
<h2>Third title</h2>
<p>Final content</p>
</div>
</div>
This Symfony page generates three Meilisearch Documents:
Document{ level: 1, title: 'First title', content: '<p>Some content</p>' }
Document{ level: 2, title: 'Second title', content: '<p>More content</p>' }
Document{ level: 2, title: 'Third title', content: '<p>Final content</p>' }
We have to do this because many Symfony Docs pages are created as "reference pages"
that explain all the main things you need to know about some feature. That's why
some of them are very long (e.g. configuration.rst
has 1,200 lines and
routing.rst
has 2,700 lines).
Splitting the entire page contents into multiple small documents is what produces better search results, closely related to the query terms.
Indexing Documents
Indexing the documents is the simplest part of the process. It's roughly like this:
1 2 3 4 5 6 7 8
use Meilisearch\Client;
$client = new Client($meilisearchEndpoint, $meilisearchMasterKey);
// if the index does not exist, it's created when indexing the first document
$index = $client->index('sfdocs');
$documents = [...];
$index->addDocuments($documents);
However, in order to produce better results, there are a couple of tweaks that your should do: stopwords and synonyms.
Stopwords are the words that are filtered out in the search index because
they are insignificant. You might think of only configuring the usual stopwords
(e.g. for English: a
, an
, and
, the
, etc.) but you probably
need to add tens of stopwords according to the language of your contents:
1 2
// check https://sites.google.com/site/kevinbouge/stopwords-lists
$index->updateStopWords(['a', 'ain\'t', 'am', 'an', 'any', 'are', 'as', '...']);
Synonyms are essential to create better search results for contents like
Symfony Docs. Meilisearch uses "prefix search" by default: searching for config
finds any word with that prefix too (configs
, configuration
, configurations
,
configuring
, etc.) but that's not enough for us.
We also need to consider that yaml
and yml
are the same; cli
, console
and terminal
are the same, etc. Thanks to Symfony Slack folks, we crafted
a long list of synonyms:
1 2 3 4 5 6 7 8
$index->updateSynonyms([
'dotenv' => ['.env'],
// ...
'env var' => ['envvar', 'environment variable', 'environment variables'],
'dependency injection' => ['di', 'dic', 'dependencyinjection', 'service container'],
// ...
'shell' => ['bash', 'sh'],
]);
In case you are wondering, yes, we also added a synonym for symphony
and
symfony
.
Regarding the reindexing of contents, the trick is to generate stable id
for the Meilisearch documents. The most important key-value pair in a document is
called id
, which is treated as the "primary key" of the document.
If two documents in the same index have the same id
, then they are treated
as the same document and the preceding document will be overwritten. So, when we
update the Symfony Docs contents, we only have to index the documents like the
first time and Meilisearch will create or update the index documents as needed.
In our case, each doc page section has a stable and unique identifier in their
absolute URL including the fragment (e.g. https://symfony.com/doc/current/configuration.html#using-php-configbuilders
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14
class DocumentDto
{
// ...
// this array is what's passed to the Meilisearch PHP API
public static function getAsArray(): array
{
return [
'id' => hash('xxh3', $this->url),
'level' => $this->level,
// ...
];
}
}
The xxh3
is a new kind of hasher, available since PHP 8.1, which provides
excellent randomness and it's an order of magnitude faster than MD5/SHA1.
Consider using it when you don't need hashes for cryptographic purposes.
Search Engine Interface
The last step to complete the migration was to update the web interface of the search engine. Before, we had a lot of JavaScript code to handle the search autocomplete, the Ajax queries, the rendering of results, etc.
In the new search engine interface we only left the JavaScript code that handles the search input modal and the keyboard navigation of the results. All the other JavaScript code was removed and replaced by some PHP code.
We've always wanted to use more Symfony UX components in symfony.com, so this was the best opportunity to try Symfony UX Live Components. Take a look at our search component:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
namespace App\Component;
use App\Search\Dto\SearchResultDto;
use App\Search\SearchEngine;
use Symfony\UX\LiveComponent\Attribute\AsLiveComponent;
use Symfony\UX\LiveComponent\Attribute\LiveProp;
use Symfony\UX\LiveComponent\DefaultActionTrait;
#[AsLiveComponent('search')]
class SearchComponent
{
use DefaultActionTrait;
#[LiveProp(writable: true)]
public ?string $query = null;
public function __construct(private SearchEngine $searchEngine)
{
}
public function getResult(): SearchResultDto
{
return $this->searchEngine->search($this->query);
}
}
And the related Twig template that shows the search input and renders the results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
<div {{ attributes }}>
<form id="form-search" data-model="*">
<label for="form-search-input" class="sr-only visually-hidden">Search Symfony Docs</label>
<input type="search" name="query" value="{{ this.query }}"
placeholder="Search Symfony Docs" aria-label="Search in Symfony documentation"
autocapitalize="off" autocomplete="off" autocorrect="off" spellcheck="false">
</form>
<div class="search-results-wrapper" style="{{ computed.result.isEmptyQuery ? 'display: none' }}">
<div class="search-results">
{% if computed.result.isError %}
<p class="error"> ... </p>
{% else %}
{% if 0 == computed.result.totalHits %}
<p class="no-results">No results. Try making your query more generic!</p>
{% else %}
{% for result in computed.result.hits %}
<div class="search-result">
<!-- ... -->
</div>
{% endfor %}
{% endif %}
{% endif %}
</div>
</div>
</div>
The PHP file and Twig template shown above is all you need to make the search autocomplete with Ajax HTTP queries. You don't need to write a single line of JavaScript to have a beautiful and modern JavaScript interface.
Read the Symfony UX Live Components docs to learn everything about this essential Symfony feature.
Evaluating the Result
At this point, we had a search engine for Symfony Docs that was simple to build and operate. But how was the result in terms of search results quality and performance?
Regarding performance: in total, including all Symfony Docs versions and the bundles documentation, we parse 12,754 files and create 85,146 Meilisearch documents. The total time spent to parse, create and index all those documents, is 17 seconds. The search index takes about 1.2 GB of disk space and most queries take less than 10 milliseconds.
Regarding the search result quality, we made an exhaustive comparison between the previous search engine and the new Meilisearch engine. We tested usual queries, weird queries, long queries, queries with typos, etc. Here's a selection of the results obtained (left: before, right: after).
Conclusion and Future Developments
The new Meilisearch-based search engine is already deployed to symfony.com for you to try it. Thanks to Meilisearch for their help during this migration process and for kindly providing the hosting of our index data.
Meilisearch provides all the features that we need (indexing arbitrary information, sorting, result weighting, stopwords, synonyms, facets and filtering, etc.) and many features that we don't need yet (like geosearch).
However, it has some known limitations that you should check before trying to use it in your projects. If your project requirements are complex (e.g. cross-faceting between multiple indexes, etc.) you should also check if those advanced features are already supported. Check the Meilisearch public roadmap too.
Finally, we want to add more features to the new search engine so you can filter results by doc type and version. We'll do that in the coming weeks, as well as other changes and improvements suggested by the community.
Tell us what you think about this via Twitter, Slack or in the comments below.
Help the Symfony project!
As with any Open-Source project, contributing code or documentation is the most common way to help, but we also have a wide range of sponsoring opportunities.
Comments are closed.
To ensure that comments stay relevant, they are closed for old posts.
I had to switch to a custom local tool for better results which imports all documentation files via the GitHub API and and offers easier searches including tagging, personalization, version comparisons etc.
Would be good if some of that could be included here as well in the next step. E.g. a user defines the Symfony versions and components that are relevant for his situation in the Symfony account and the search results are then filtered/ranked based on that.
> In case you are wondering, yes, we also added a synonym for symphony and symfony.
synfony instead of symfony?
(really interesting article by the way :))