The New Symfony Documentation Search Engine
April 29, 2016 • Published by Javier Eguiluz
Symfony boasts one of the largest documentation pools ever written for an Open-Source project. Considering the ten different Symfony versions (from 2.0 to master) and including the code samples, Symfony Documentation has around 3.6 million words, more than three times the word count of the entire Harry Potter series.
It's hard to create a search engine
This massive documentation requires providing an effective way to look for what you are interested in. At first we used Apache Solr to create a custom search engine, but a few months ago we tried to improve it by switching to ElasticSearch.
Given the complexity of creating a good search engine, before completing the ElasticSearch integration, we started looking at other alternatives. The search engine as a service market is not very crowded, so it didn't take us long to review all the possibilities.
Following this initial analysis, we chose Algolia as the most promising candidate and started developing a proof of concept for the new Symfony Documentation search engine.
The proof of concept
Algolia provides a tight integration with Symfony applications thanks to their AlgoliaSearchBundle and its complete documentation. Adding a few annotations to your entities and running certain commands will index the entire application contents allowing you to create a search engine in a matter of minutes.
However, our documentation contents don't use entities, so instead we used the low-level Algolia PHP API. This requires creating indexing objects yourself and updating them accordingly to keep your index up-to-date.
Indexing the documentation
The source of Symfony Documentation is written in reStructuredText, a file format similar to Markdown yet more powerful and strict. These contents must be transformed into "indexing objects" to create the search engine.
Think of an indexing object as an independent unit of information stored in the index and which can be displayed as a search result. These objects can store as much or as little content as needed through the application.
For example, these are the contents of the book/testing.rst
file, which
corresponds to the testing chapter in the Symfony Book:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Testing
=======
Whenever you write a new line of code, you also potentially add new bugs.
To build better and more reliable applications, you should test your code
using both functional and unit tests.
The PHPUnit Testing Framework
-----------------------------
Symfony integrates with an independent library - called PHPUnit - to give
you a rich testing framework. This chapter won't cover PHPUnit itself, but
it has its own excellent `documentation`_.
[...]
How many objects would you create to index this content? We created four: one
for each section title and one for each section content (which in this case is
just a paragraph). Here's the PHP array that represents the indexing object for
the Testing
section title:
1 2 3 4 5 6 7 8 9 10 11
array(
'title' => 'Testing',
'content' => '',
'breadcrumb' => array(
'lvl1' => 'The Book',
),
'type_of_content' => 'book',
'url' => 'https://symfony.com/doc/current/book/testing.html#main',
'version' => '3.0',
'importance' => 0,
)
Similarly, this is the indexing object for the content below the `Testing` title:
1 2 3 4 5 6 7 8 9 10 11 12
array(
'title' => 'Testing',
'content' => 'Whenever you write a new line of ...',
'breadcrumb' => array(
'lvl1' => 'The Book',
'lvl2' => 'Testing',
),
'type_of_content' => 'book',
'url' => 'https://symfony.com/doc/current/book/testing.html#main',
'version' => '3.0',
'importance' => 4,
)
The attributes of the indexing objects can be chosen freely and you can also
decide which attributes to index, how to use them to perform the search queries,
etc. In addition to the basic title
, content
, breadcrumb
and url
attributes, we included the type_of_content
(to differentiate the Symfony
Book from the cookbook, the bundles, etc.), the version
of Symfony and the
importance
, which weighs the relevance of each content (0
for h1
titles, 1
for h2
titles, 4
for content below h1
titles, etc.)
The granularity of these indexing objects may seem like a burden, but it's essential to display highly relevant search results. In total, we created 116,513 objects to index the entire documentation for all Symfony versions, which roughly translates to 30 words per object.
Once the objects are indexed, you already have a ready-to-use search index. However, there are still some minor tweaks to do to improve the algorithm used to generate the search results. In our case, those tweaks were made by the Algolia developers, who are always willing to help us and provided very high quality support.
Building the search engine interface
At this point we had all the contents indexed and a search engine that provided fast and relevant results. The last step needed to finish the new search engine was to design the interface used to perform the queries.
We implemented the interface following the Instant Search tutorial and using the Algolia helper to interact with the search engine. That's why the core of our JavaScript application is simple:
1 2 3 4 5 6 7 8 9 10 11 12 13
// perform a query for every character typed in the search box
$searchInput.on('keyup', function(event) {
var query = $(this).val();
algoliaHelper.setQuery(query).search();
}).focus();
// render the search results of the previous query
algoliaHelper.on('result', function(content, state) {
renderHits(content);
renderFacets(content, state);
renderPagination(content);
// ...
});
Note
Soon after implementing the interface, Algolia released InstantSearch.js, a widget based library, that allows to do the same as with the helper, in an even quicker way.
Next, we worked on the "facets" or filters that allow refining the search results based on the content type and the Symfony version. You can set any of the filters independently, so they are considered "disjunctive facets". Thanks to the JavaScript helper, you just need to call a method to set/unset these facets before performing the search query:
1
algoliaHelper.addDisjunctiveFacetRefinement(facetName, facetValue).search();
The only issue we faced was that facet names and values couldn’t be obtained
through the Algolia API. The solution was to perform an empty query when the
search page is loaded and extract the facets from the empty search results. This
trick was also useful to show the current
Symfony version pre-selected in
the filters:
1 2 3 4 5 6
// empty query to get the facet names and values
algoliaHelper
.addDisjunctiveFacetRefinement('version', 'current')
.setQuery('')
.search()
;
Finally, the search results are rendered using JavaScript templates created with mustache.js. If you are used to advanced template engines like Twig, you'll find mustache.js very limiting, but luckily the search engine templates are simple:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
<!-- the template used to render the search results -->
<script type="text/template" id="hit-template">
{{#hits}} <!-- equivalent to {% if hits %} -->
<div class="hit">
<div class="hit-content">
{{#title}}
<div class="hit-title">
<!-- {{{ foo }}} is equivalent to {{ foo|raw }} -->
<a href="{{ url }}">{{{ title }}}</a>
</div>
{{/title}}
...
</div>
{{/hits}}
</script>
Conclusions
The proof of concept for the new search engine was a success and we decided to stop the ElasticSearch integration and stick with Algolia. The new search engine is greatly faster than the previous one and the search results are more accurate and relevant.
The new search engine has other perks too, such as being tolerant to typos
(yoml syntax
returns the same results as yaml syntax
) and allowing to
exclude terms (file upload -doctrine
looks for contents that explain file
uploading without using Doctrine).
The contents are fully re-indexed once a day and the search index is replicated worldwide in 14 locations, so you'll get an instant response no matter where you live.
Try the new search engine at symfony.com/search and share your thoughts about it.
Help the Symfony project!
As with any Open-Source project, contributing code or documentation is the most common way to help, but we also have a wide range of sponsoring opportunities.
Comments are closed.
To ensure that comments stay relevant, they are closed for old posts.
Is there a way to use a query param for a particular search term? I search the site initially vi a an Alfred (OSX app) command.
@Gábor that's indeed something we're considering!
https://symfony.com/search?q=Fabien&v=2.3&b=cookbook
also a non js fallback would be nice. had to disable adblock, open up my restricted access to other domains (done with the firefox addon policeman) and allow cookies and javascript.