Affected versions

Symfony versions >=6.1, <6.4.41, >=7, <7.4.13, >=8, <8.0.13 of the Symfony HTML Sanitizer component are affected by this security issue.

The issue has been fixed in Symfony 6.4.41, 7.4.13, 8.0.13.

Description

Symfony\Component\HtmlSanitizer\TextSanitizer\UrlSanitizer::parse() rejects URLs containing raw Unicode explicit-direction BiDi formatting characters (U+202A–U+202E, U+2066–U+2069) as a defense against visual-spoofing of the rendered href. The check covers only the raw UTF-8 forms of those code points: the percent-encoded forms (%E2%80%AE for U+202E, %E2%81%A6 for U+2066, etc.) are not matched by the deny regex, survive league/uri's parse/build cycle, and are re-emitted unchanged in the sanitized URL. Any downstream consumer that decodes the link before display, phishing-detection filters that compare urldecode($href) against a domain allow-list, audit-log dashboards that show a decoded form for readability, hover-tooltip previews, federated/syndicated content where the decoder lives on the consuming side, restores the BiDi character and the visual spoof that the original defense was filed to prevent.

The same UrlSanitizer::parse() carries an ASCII-only /\s/ whitespace check (no /u modifier) intended as a backstop against malformed URLs. Without the /u modifier, PCRE's \s matches only ASCII whitespace, so Unicode whitespace characters, NBSP (U+00A0), the zero-width no-break space / BOM (U+FEFF), line/paragraph separators (U+2028, U+2029), ogham space (U+1680), the U+2000–U+200A en/em quad family, narrow / medium / ideographic spaces (U+202F, U+205F, U+3000) and NEL (U+0085), pass through unchanged in both raw and percent-encoded forms. In hostname positions they enable lookalike spoofs (example<NBSP>.com); in path/query/fragment they enable allow-list drift when a downstream consumer strips whitespace before comparison.

Resolution

UrlSanitizer::parse() now denies BiDi formatting marks together with Unicode whitespace and the zero-width no-break space, in both the raw input and the percent-decoded form of each parsed URL component (user, pass, host, path, query, fragment). ASCII space remains tolerated in path/query/fragment via the existing percent-encoding step.

The patches for this issue are available here for branch 6.4 (and forward-ported to 7.4, 8.0 and 8.1).

Credits

We would like to thank Scott Arciszewski for discovering the issue and Nicolas Grekas for providing the fix.