Mathieu
Contributed by Mathieu in #49300

Take a look at the two following domain names: "symfony.com" and "ѕymfony.com". The look similar, but they are not the same. In the second domain, the first letter is not s (the lowercase s letter in Latin script) but ѕ (a letter called dze in the Cyrillic script).

Using different but similarly looking characters is the base of IDN homograph attacks, a type of spoofing security attack. That's why it is recommended to check user-submitted, public-facing identifiers for suspicious characters in order to prevent such attacks.

However, given that Unicode defines more than 150,000 valid characters, this is a daunting task. For example, did you know that there are invisible characters such as zero-width spaces? And what about mixing 8 (digit eight in Latin script) and (digit four in Bengali script)? Don't forget either about combining characters, such as the "combining dot" that can be placed after the character i to make it invisible.

In Symfony 6.3, we're introducing a new NoSuspiciousCharacters constraint so you can validate that strings don't contain any of these problematic characters. It's based on the Spoofchecker class provided by the PHP intl extension and it works as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// src/Entity/User.php
namespace App\Entity;

use Symfony\Component\Validator\Constraints as Assert;

class User
{
    #[Assert\NoSuspiciousCharacters(
        // checks zero-width spaces and numbers looking the same (e.g. 8 and )
        checks: NoSuspiciousCharacters::CHECK_INVISIBLE | NoSuspiciousCharacters::CHECK_MIXED_NUMBERS,
        restrictionLevel: NoSuspiciousCharacters::RESTRICTION_LEVEL_HIGH,
    )]
    private string $username;

    #[Assert\NoSuspiciousCharacters]
    private string $blogUrl;

    // ...
}

Read the NoSuspiciousCharacters constraint docs to learn more about its usage and options.

Published in #Living on the edge