Skip to content

HTML Sanitizer

Edit this page

The HTML Sanitizer component aims at sanitizing/cleaning untrusted HTML code (e.g. created by a WYSIWYG editor in the browser) into HTML that can be trusted. It is based on the HTML Sanitizer W3C Standard Proposal.

The HTML sanitizer creates a new HTML structure from scratch, taking only the elements and attributes that are allowed by configuration. This means that the returned HTML is very predictable (it only contains allowed elements), but it does not work well with badly formatted input (e.g. invalid HTML). The sanitizer is targeted for two use cases:

  • Preventing security attacks based on XSS or other technologies relying on the execution of malicious code on the visitors browsers;
  • Generating HTML that always respects a certain format (only certain tags, attributes, hosts, etc.) to be able to consistently style the resulting output with CSS. This also protects your application against attacks related to e.g. changing the CSS of the whole page.

Installation

You can install the HTML Sanitizer component with:

1
$ composer require symfony/html-sanitizer

Basic Usage

Use the HtmlSanitizer class to sanitize the HTML. In the Symfony framework, this class is available as the html_sanitizer service. This service will be autowired automatically when type-hinting for HtmlSanitizerInterface:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// src/Controller/BlogPostController.php
namespace App\Controller;

// ...
use Symfony\Component\HtmlSanitizer\HtmlSanitizerInterface;

class BlogPostController extends AbstractController
{
    public function createAction(HtmlSanitizerInterface $htmlSanitizer, Request $request): Response
    {
        $unsafeContents = $request->getPayload()->get('post_contents');

        $safeContents = $htmlSanitizer->sanitize($unsafeContents);
        // ... proceed using the safe HTML
    }
}

Note

The default configuration of the HTML sanitizer allows all "safe" elements and attributes, as defined by the W3C Standard Proposal. In practice, this means that the resulting code will not contain any scripts, styles or other elements that can cause the website to behave or look different. Later in this article, you'll learn how to fully customize the HTML sanitizer.

Sanitizing HTML for a Specific Context

The default sanitize() method cleans the HTML code for usage in the <body> element. Using the sanitizeFor() method, you can instruct HTML sanitizer to customize this for the <head> or a more specific HTML tag:

1
2
3
4
5
6
7
8
9
10
// tags not allowed in <head> will be removed
$safeInput = $htmlSanitizer->sanitizeFor('head', $userInput);

// encodes the returned HTML using HTML entities
$safeInput = $htmlSanitizer->sanitizeFor('title', $userInput);
$safeInput = $htmlSanitizer->sanitizeFor('textarea', $userInput);

// uses the <body> context, removing tags only allowed in <head>
$safeInput = $htmlSanitizer->sanitizeFor('body', $userInput);
$safeInput = $htmlSanitizer->sanitizeFor('section', $userInput);

Sanitizing HTML from Form Input

The HTML sanitizer component directly integrates with Symfony Forms, to sanitize the form input before it is processed by your application.

You can enable the sanitizer in TextType forms, or any form extending this type (such as TextareaType), using the sanitize_html option:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// src/Form/BlogPostType.php
namespace App\Form;

// ...
class BlogPostType extends AbstractType
{
    // ...

    public function configureOptions(OptionsResolver $resolver): void
    {
        $resolver->setDefaults([
            'sanitize_html' => true,
            // use the "sanitizer" option to use a custom sanitizer (see below)
            //'sanitizer' => 'app.post_sanitizer',
        ]);
    }
}

Sanitizing HTML in Twig Templates

Besides sanitizing user input, you can also sanitize HTML code before outputting it in a Twig template using the sanitize_html() filter:

1
2
3
4
{{ post.body|sanitize_html }}

{# you can also use a custom sanitizer (see below) #}
{{ post.body|sanitize_html('app.post_sanitizer') }}

Configuration

The behavior of the HTML sanitizer can be fully customized. This allows you to explicitly state which elements, attributes and even attribute values are allowed.

You can do this by defining a new HTML sanitizer in the configuration:

1
2
3
4
5
6
7
# config/packages/html_sanitizer.yaml
framework:
    html_sanitizer:
        sanitizers:
            app.post_sanitizer:
                block_elements:
                    - h1

This configuration defines a new html_sanitizer.sanitizer.app.post_sanitizer service. This service will be autowired for services having an HtmlSanitizerInterface $appPostSanitizer parameter.

Allow Element Baselines

You can start the custom HTML sanitizer by using one of the two baselines:

Static elements
All elements and attributes on the baseline allow lists from the W3C Standard Proposal (this does not include scripts).
Safe elements
All elements and attributes from the "static elements" list, excluding elements and attributes that can also lead to CSS injection/click-jacking.
1
2
3
4
5
6
7
8
# config/packages/html_sanitizer.yaml
framework:
    html_sanitizer:
        sanitizers:
            app.post_sanitizer:
                # enable either of these
                allow_safe_elements: true
                allow_static_elements: true

Allow Elements

This adds elements to the allow list. For each element, you can also specify the allowed attributes on that element. If not given, all allowed attributes from the W3C Standard Proposal are allowed.

1
2
3
4
5
6
7
8
9
10
11
12
13
# config/packages/html_sanitizer.yaml
framework:
    html_sanitizer:
        sanitizers:
            app.post_sanitizer:
                # ...
                allow_elements:
                    # allow the <article> element and 2 attributes
                    article: ['class', 'data-attr']
                    # allow the <img> element and preserve the src attribute
                    img: 'src'
                    # allow the <h1> element with all safe attributes
                    h1: '*'

Block and Drop Elements

You can also block (the element will be removed, but its children will be kept) or drop (the element and its children will be removed) elements.

This can also be used to remove elements from the allow list.

1
2
3
4
5
6
7
8
9
10
11
# config/packages/html_sanitizer.yaml
framework:
    html_sanitizer:
        sanitizers:
            app.post_sanitizer:
                # ...

                # remove <div>, but process the children
                block_elements: ['div']
                # remove <figure> and its children
                drop_elements: ['figure']

Allow Attributes

Using this option, you can specify which attributes will be preserved in the returned HTML. The attribute will be allowed on the given elements, or on all elements allowed before this setting.

1
2
3
4
5
6
7
8
9
10
11
12
# config/packages/html_sanitizer.yaml
framework:
    html_sanitizer:
        sanitizers:
            app.post_sanitizer:
                # ...
                allow_attributes:
                    # allow "src' on <iframe> elements
                    src: ['iframe']

                    # allow "data-attr" on all elements currently allowed
                    data-attr: '*'

Drop Attributes

This option allows you to disallow attributes that were allowed before.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# config/packages/html_sanitizer.yaml
framework:
    html_sanitizer:
        sanitizers:
            app.post_sanitizer:
                # ...
                allow_attributes:
                    # allow the "data-attr" on all safe elements...
                    data-attr: '*'

                drop_attributes:
                    # ...except for the <section> element
                    data-attr: ['section']
                    # disallows "style' on any allowed element
                    style: '*'

Force Attribute Values

Using this option, you can force an attribute with a given value on an element. For instance, use the follow config to always set rel="noopener noreferrer" on each <a> element (even if the original one didn't contain a rel attribute):

1
2
3
4
5
6
7
8
9
# config/packages/html_sanitizer.yaml
framework:
    html_sanitizer:
        sanitizers:
            app.post_sanitizer:
                # ...
                force_attributes:
                    a:
                        rel: noopener noreferrer

Besides allowing/blocking elements and attributes, you can also control the URLs of <a> elements:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# config/packages/html_sanitizer.yaml
framework:
    html_sanitizer:
        sanitizers:
            app.post_sanitizer:
                # ...

                # if `true`, all URLs using the `http://` scheme will be converted to
                # use the `https://` scheme instead. `http` still needs to be allowed
                # in `allowed_link_schemes`
                force_https_urls: true

                # specifies the allowed URL schemes. If the URL has a different scheme, the
                # attribute will be dropped
                allowed_link_schemes: ['http', 'https', 'mailto']

                # specifies the allowed hosts, the attribute will be dropped if the
                # URL contains a different host. Subdomains are allowed: e.g. the following
                # config would also allow 'www.symfony.com', 'live.symfony.com', etc.
                allowed_link_hosts: ['symfony.com']

                # whether to allow relative links (i.e. URLs without scheme and host)
                allow_relative_links: true

Force/Allow Media URLs

Like link URLs, you can also control the URLs of other media in the HTML. The following attributes are checked by the HTML sanitizer: src, href, lowsrc, background and ping.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# config/packages/html_sanitizer.yaml
framework:
    html_sanitizer:
        sanitizers:
            app.post_sanitizer:
                # ...

                # if `true`, all URLs using the `http://` scheme will be converted to
                # use the `https://` scheme instead. `http` still needs to be allowed
                # in `allowed_media_schemes`
                force_https_urls: true

                # specifies the allowed URL schemes. If the URL has a different scheme, the
                # attribute will be dropped
                allowed_media_schemes: ['http', 'https', 'mailto']

                # specifies the allowed hosts, the attribute will be dropped if the URL
                # contains a different host which is not a subdomain of the allowed host
                allowed_media_hosts: ['symfony.com'] # Also allows any subdomain (i.e. www.symfony.com)

                # whether to allow relative URLs (i.e. URLs without scheme and host)
                allow_relative_medias: true

Max Input Length

In order to prevent DoS attacks, by default the HTML sanitizer limits the input length to 20000 characters (as measured by strlen($input)). All the contents exceeding that length will be truncated. Use this option to increase or decrease this limit:

1
2
3
4
5
6
7
8
9
# config/packages/html_sanitizer.yaml
framework:
    html_sanitizer:
        sanitizers:
            app.post_sanitizer:
                # ...

                # inputs longer (in characters) than this value will be truncated
                max_input_length: 30000 # default: 20000

It is possible to disable this length limit by setting the max input length to -1. Beware that it may expose your application to DoS attacks.

Custom Attribute Sanitizers

Controlling the link and media URLs is done by the UrlAttributeSanitizer. You can also implement your own attribute sanitizer, to control the value of other attributes in the HTML. Create a class implementing AttributeSanitizerInterface and register it as a service. After this, use with_attribute_sanitizers to enable it for an HTML sanitizer:

1
2
3
4
5
6
7
8
9
10
11
12
# config/packages/html_sanitizer.yaml
framework:
    html_sanitizer:
        sanitizers:
            app.post_sanitizer:
                # ...
                with_attribute_sanitizers:
                    - App\Sanitizer\CustomAttributeSanitizer

                # you can also disable previously enabled custom attribute sanitizers
                #without_attribute_sanitizers:
                #    - App\Sanitizer\CustomAttributeSanitizer
This work, including the code samples, is licensed under a Creative Commons BY-SA 3.0 license.
TOC
    Version