Symfony 6.1 will be released at the end of May 2022 and it will require PHP 8.1 or higher. This is the first article of the series that shows the most important new features introduced by Symfony 6.1.


Titouan Galopin
Contributed by Titouan Galopin in #44681

Web applications often need to work with HTML contents generated by users. It's difficult to do so in a safe way. Rendering those unsafe HTML contents in a Twig template or injecting them via JavaScript in the innerHTML property of elements can lead to unwanted and dangerous JavaScript code execution.

HTML sanitization is "the process of examining an HTML document and producing a new HTML document that preserves only whatever tags or attributes that are designated safe and desired".

Most of the times, this sanitization process is used to protect against attacks such as cross-site scripting (XSS). However, sanitization is also about fixing wrong HTML contents in the best way possible:

1
2
3
4
5
6
7
8
9
<!-- an example of a wrong HTML input provided by the user -->
Original: <div><em>foo</div>
<!-- the best solution to fix this HTML code is to add the missing tag -->
Sanitized: <div><em>foo</em></div>

<!-- however, if the HTML error appears in other elements, the fix could be different -->
Original: <textarea><em>foo</textarea>
<!-- the best solution in this case is to HTML encode the wrong tag -->
Sanitized: <textarea>&lt;em&gt;foo</textarea>

In Symfony 6.1 we're adding a PHP-based HTML sanitizer so you can transform user generated HTML content into safe HTML content. This new component is similar to the upcoming W3C HTML Sanitizer API and we even use the same method names whenever possible to ease the learning curve.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
use Symfony\Component\HtmlSanitizer\HtmlSanitizerConfig;

// By default, any elements not included in the allowed or blocked elements
// will be dropped, including its children
$config = (new HtmlSanitizerConfig())
    // Allow "safe" elements and attributes. All scripts will be removed
    // as well as other dangerous behaviors like CSS injection
    ->allowSafeElements()

    // Allow the "div" element and no attribute can be on it
    ->allowElement('div')

    // Allow the "a" element, and the "title" attribute to be on it
    ->allowElement('a', ['title'])

    // Allow the "span" element, and any attribute from the Sanitizer API is allowed
    // (see https://wicg.github.io/sanitizer-api/#default-configuration)
    ->allowElement('span', '*')

    // Drop the "div" element: this element will be removed, including its children
    ->dropElement('div')
;

In addition to adding and removing HTML elements and attributes, you can force the value of some attributes to improve the resulting HTML contents:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$config = (new HtmlSanitizerConfig())
    // ...

    // Forcefully set the value of all "rel" attributes on "a"
    // elements to "noopener noreferrer"
    ->forceAttribute('a', 'rel', 'noopener noreferrer')

    // Drop the "data-custom-attr" attribute from all elements:
    // this attribute will be removed
    ->dropAttribute('data-custom-attr', '*')

    // Transform all HTTP schemes to HTTPS
    ->forceHttpsUrls()

    // Configure which hosts are allowed in img/audio/video/iframe (by default all are allowed)
    ->allowedMediaHosts(['youtube.com', 'example.com'])
;

In addition to these, there are many other configuration options. Check out the docs for the HtmlSanitizer bundle. Once configured, use the sanitizer as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
use Symfony\Component\HtmlSanitizer\HtmlSanitizer;

$sanitizer = new HtmlSanitizer($config);

// this sanitizes contents in the <body> context, removing any tags that are
// only allowed inside the <head> element
$sanitizer->sanitize($userInput);

// this sanitizes contents to include them inside a <head> tag
$sanitizer->sanitizeFor('head', $userInput);

// this sanitizes contents in the best way possible for the HTML element
// provided as the first argument (sometimes it will add missing tags and
// other times it will HTML-encode the unclosed tags)
$sanitizer->sanitizeFor('textarea', $userInput); // it will encode as HTML entities
$sanitizer->sanitizeFor('div', $userInput);      // it will sanitize same as <body>
Published in #Living on the edge