HTML Sanitizer
The HTML Sanitizer component aims at sanitizing/cleaning untrusted HTML code (e.g. created by a WYSIWYG editor in the browser) into HTML that can be trusted. It is based on the HTML Sanitizer W3C Standard Proposal.
The HTML sanitizer creates a new HTML structure from scratch, taking only the elements and attributes that are allowed by configuration. This means that the returned HTML is very predictable (it only contains allowed elements), but it does not work well with badly formatted input (e.g. invalid HTML). The sanitizer is targeted for two use cases:
- Preventing security attacks based on XSS or other technologies relying on the execution of malicious code on the visitors browsers;
- Generating HTML that always respects a certain format (only certain tags, attributes, hosts, etc.) to be able to consistently style the resulting output with CSS. This also protects your application against attacks related to e.g. changing the CSS of the whole page.
Installation
You can install the HTML Sanitizer component with:
1
$ composer require symfony/html-sanitizer
Basic Usage
Use the HtmlSanitizer class to
sanitize the HTML. In the Symfony framework, this class is available as the
html_sanitizer
service. This service will be autowired
automatically when type-hinting for
HtmlSanitizerInterface:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
// src/Controller/BlogPostController.php
namespace App\Controller;
// ...
use Symfony\Component\HtmlSanitizer\HtmlSanitizerInterface;
class BlogPostController extends AbstractController
{
public function createAction(HtmlSanitizerInterface $htmlSanitizer, Request $request): Response
{
$unsafeContents = $request->getPayload()->get('post_contents');
$safeContents = $htmlSanitizer->sanitize($unsafeContents);
// ... proceed using the safe HTML
}
}
Note
The default configuration of the HTML sanitizer allows all "safe" elements and attributes, as defined by the W3C Standard Proposal. In practice, this means that the resulting code will not contain any scripts, styles or other elements that can cause the website to behave or look different. Later in this article, you'll learn how to fully customize the HTML sanitizer.
Sanitizing HTML for a Specific Context
The default sanitize()
method cleans the HTML code for usage in the <body>
element. Using the
sanitizeFor()
method, you can instruct HTML sanitizer to customize this for the
<head>
or a more specific HTML tag:
1 2 3 4 5 6 7 8 9 10
// tags not allowed in <head> will be removed
$safeInput = $htmlSanitizer->sanitizeFor('head', $userInput);
// encodes the returned HTML using HTML entities
$safeInput = $htmlSanitizer->sanitizeFor('title', $userInput);
$safeInput = $htmlSanitizer->sanitizeFor('textarea', $userInput);
// uses the <body> context, removing tags only allowed in <head>
$safeInput = $htmlSanitizer->sanitizeFor('body', $userInput);
$safeInput = $htmlSanitizer->sanitizeFor('section', $userInput);
Sanitizing HTML from Form Input
The HTML sanitizer component directly integrates with Symfony Forms, to sanitize the form input before it is processed by your application.
You can enable the sanitizer in TextType
forms, or any form extending
this type (such as TextareaType
), using the sanitize_html
option:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
// src/Form/BlogPostType.php
namespace App\Form;
// ...
class BlogPostType extends AbstractType
{
// ...
public function configureOptions(OptionsResolver $resolver): void
{
$resolver->setDefaults([
'sanitize_html' => true,
// use the "sanitizer" option to use a custom sanitizer (see below)
//'sanitizer' => 'app.post_sanitizer',
]);
}
}
Sanitizing HTML in Twig Templates
Besides sanitizing user input, you can also sanitize HTML code before
outputting it in a Twig template using the sanitize_html()
filter:
1 2 3 4
{{ post.body|sanitize_html }}
{# you can also use a custom sanitizer (see below) #}
{{ post.body|sanitize_html('app.post_sanitizer') }}
Configuration
The behavior of the HTML sanitizer can be fully customized. This allows you to explicitly state which elements, attributes and even attribute values are allowed.
You can do this by defining a new HTML sanitizer in the configuration:
1 2 3 4 5 6 7
# config/packages/html_sanitizer.yaml
framework:
html_sanitizer:
sanitizers:
app.post_sanitizer:
block_elements:
- h1
This configuration defines a new html_sanitizer.sanitizer.app.post_sanitizer
service. Now you have two ways of injecting it in any service or controller:
(1) Use a specific argument name
Type-hint your constructor/method argument with HtmlSanitizerInterface
and name
the argument using this pattern: "HTML sanitizer name in camelCase". For example, to
inject the app.post_sanitizer
defined earlier, use an argument named $appPostSanitizer
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// src/Controller/ApiController.php
namespace App\Controller;
use Symfony\Bundle\FrameworkBundle\Controller\AbstractController;
use Symfony\Component\HtmlSanitizer\HtmlSanitizerInterface;
class BlogController extends AbstractController
{
public function __construct(
private HtmlSanitizerInterface $appPostSanitizer,
) {
}
// ...
}
(2) Use the #[Target]
attribute
When dealing with multiple implementations of the same type
the #[Target]
attribute helps you select which one to inject. Symfony creates
a target with the same name as the HTML sanitizer:
1 2 3 4 5 6 7 8 9 10 11 12 13
// ...
use Symfony\Component\DependencyInjection\Attribute\Target;
class BlogController extends AbstractController
{
public function __construct(
#[Target('app.post_sanitizer')]
private HtmlSanitizerInterface $sanitizer,
) {
}
// ...
}
Allow Element Baselines
You can start the custom HTML sanitizer by using one of the two baselines:
- Static elements
- All elements and attributes on the baseline allow lists from the W3C Standard Proposal (this does not include scripts).
- Safe elements
- All elements and attributes from the "static elements" list, excluding elements and attributes that can also lead to CSS injection/click-jacking.
1 2 3 4 5 6 7 8
# config/packages/html_sanitizer.yaml
framework:
html_sanitizer:
sanitizers:
app.post_sanitizer:
# enable either of these
allow_safe_elements: true
allow_static_elements: true
Allow Elements
This adds elements to the allow list. For each element, you can also specify the allowed attributes on that element. If not given, all allowed attributes from the W3C Standard Proposal are allowed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# config/packages/html_sanitizer.yaml
framework:
html_sanitizer:
sanitizers:
app.post_sanitizer:
# ...
allow_elements:
# allow the <article> element and 2 attributes
article: ['class', 'data-attr']
# allow the <img> element and preserve the src attribute
img: 'src'
# allow the <h1> element with all safe attributes
h1: '*'
# allow the <div> element with no attributes
div: []
Block and Drop Elements
You can also block (the element will be removed, but its children will be kept) or drop (the element and its children will be removed) elements.
This can also be used to remove elements from the allow list.
1 2 3 4 5 6 7 8 9 10 11
# config/packages/html_sanitizer.yaml
framework:
html_sanitizer:
sanitizers:
app.post_sanitizer:
# ...
# remove <div>, but process the children
block_elements: ['div']
# remove <figure> and its children
drop_elements: ['figure']
Allow Attributes
Using this option, you can specify which attributes will be preserved in the returned HTML. The attribute will be allowed on the given elements, or on all elements allowed before this setting.
1 2 3 4 5 6 7 8 9 10 11 12
# config/packages/html_sanitizer.yaml
framework:
html_sanitizer:
sanitizers:
app.post_sanitizer:
# ...
allow_attributes:
# allow "src' on <iframe> elements
src: ['iframe']
# allow "data-attr" on all elements currently allowed
data-attr: '*'
Drop Attributes
This option allows you to disallow attributes that were allowed before.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# config/packages/html_sanitizer.yaml
framework:
html_sanitizer:
sanitizers:
app.post_sanitizer:
# ...
allow_attributes:
# allow the "data-attr" on all safe elements...
data-attr: '*'
drop_attributes:
# ...except for the <section> element
data-attr: ['section']
# disallows "style' on any allowed element
style: '*'
Force Attribute Values
Using this option, you can force an attribute with a given value on an
element. For instance, use the follow config to always set rel="noopener noreferrer"
on each <a>
element (even if the original one didn't contain a rel
attribute):
1 2 3 4 5 6 7 8 9
# config/packages/html_sanitizer.yaml
framework:
html_sanitizer:
sanitizers:
app.post_sanitizer:
# ...
force_attributes:
a:
rel: noopener noreferrer
Force/Allow Link URLs
Besides allowing/blocking elements and attributes, you can also control the
URLs of <a>
elements:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
# config/packages/html_sanitizer.yaml
framework:
html_sanitizer:
sanitizers:
app.post_sanitizer:
# ...
# if `true`, all URLs using the `http://` scheme will be converted to
# use the `https://` scheme instead. `http` still needs to be allowed
# in `allowed_link_schemes`
force_https_urls: true
# specifies the allowed URL schemes. If the URL has a different scheme, the
# attribute will be dropped
allowed_link_schemes: ['http', 'https', 'mailto']
# specifies the allowed hosts, the attribute will be dropped if the
# URL contains a different host. Subdomains are allowed: e.g. the following
# config would also allow 'www.symfony.com', 'live.symfony.com', etc.
allowed_link_hosts: ['symfony.com']
# whether to allow relative links (i.e. URLs without scheme and host)
allow_relative_links: true
Force/Allow Media URLs
Like link URLs, you can also control the
URLs of other media in the HTML. The following attributes are checked by
the HTML sanitizer: src
, href
, lowsrc
, background
and ping
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
# config/packages/html_sanitizer.yaml
framework:
html_sanitizer:
sanitizers:
app.post_sanitizer:
# ...
# if `true`, all URLs using the `http://` scheme will be converted to
# use the `https://` scheme instead. `http` still needs to be allowed
# in `allowed_media_schemes`
force_https_urls: true
# specifies the allowed URL schemes. If the URL has a different scheme, the
# attribute will be dropped
allowed_media_schemes: ['http', 'https', 'mailto']
# specifies the allowed hosts, the attribute will be dropped if the URL
# contains a different host which is not a subdomain of the allowed host
allowed_media_hosts: ['symfony.com'] # Also allows any subdomain (i.e. www.symfony.com)
# whether to allow relative URLs (i.e. URLs without scheme and host)
allow_relative_medias: true
Max Input Length
In order to prevent DoS attacks, by default the HTML sanitizer limits the
input length to 20000
characters (as measured by strlen($input)
). All
the contents exceeding that length will be truncated. Use this option to
increase or decrease this limit:
1 2 3 4 5 6 7 8 9
# config/packages/html_sanitizer.yaml
framework:
html_sanitizer:
sanitizers:
app.post_sanitizer:
# ...
# inputs longer (in characters) than this value will be truncated
max_input_length: 30000 # default: 20000
It is possible to disable this length limit by setting the max input length to
-1
. Beware that it may expose your application to DoS attacks.
Custom Attribute Sanitizers
Controlling the link and media URLs is done by the
UrlAttributeSanitizer.
You can also implement your own attribute sanitizer, to control the value
of other attributes in the HTML. Create a class implementing
AttributeSanitizerInterface
and register it as a service. After this, use with_attribute_sanitizers
to enable it for an HTML sanitizer:
1 2 3 4 5 6 7 8 9 10 11 12
# config/packages/html_sanitizer.yaml
framework:
html_sanitizer:
sanitizers:
app.post_sanitizer:
# ...
with_attribute_sanitizers:
- App\Sanitizer\CustomAttributeSanitizer
# you can also disable previously enabled custom attribute sanitizers
#without_attribute_sanitizers:
# - App\Sanitizer\CustomAttributeSanitizer