Take a look at the two following domain names: "symfony.com" and "ѕymfony.com".
The look similar, but they are not the same. In the second domain, the first
letter is not s
(the lowercase s
letter in Latin script) but ѕ
(a letter called dze in the Cyrillic script).
Using different but similarly looking characters is the base of IDN homograph attacks, a type of spoofing security attack. That's why it is recommended to check user-submitted, public-facing identifiers for suspicious characters in order to prevent such attacks.
However, given that Unicode defines more than 150,000 valid characters, this is
a daunting task. For example, did you know that there are invisible characters
such as zero-width spaces? And what about mixing 8
(digit eight in Latin script)
and ৪
(digit four in Bengali script)? Don't forget either about combining
characters, such as the "combining dot" that can be placed after the character
i
to make it invisible.
In Symfony 6.3, we're introducing a new NoSuspiciousCharacters constraint
so you can validate that strings don't contain any of these problematic characters.
It's based on the Spoofchecker
class provided by the PHP intl extension and
it works as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
// src/Entity/User.php
namespace App\Entity;
use Symfony\Component\Validator\Constraints as Assert;
class User
{
#[Assert\NoSuspiciousCharacters(
// checks zero-width spaces and numbers looking the same (e.g. 8 and ৪)
checks: NoSuspiciousCharacters::CHECK_INVISIBLE | NoSuspiciousCharacters::CHECK_MIXED_NUMBERS,
restrictionLevel: NoSuspiciousCharacters::RESTRICTION_LEVEL_HIGH,
)]
private string $username;
#[Assert\NoSuspiciousCharacters]
private string $blogUrl;
// ...
}
Read the NoSuspiciousCharacters constraint docs to learn more about its usage and options.
Great feature ! Is it possible to enable that constraint globally ? Or we have to implement it ourself (with some events)
Alessandro,
Don't use this constraint globally, only where those characters can cause problems, like in domain name or username. Those characters are legitimate. Some keyboard will used those automatically, for example in the Japanese keyboard, they can use the same numbers (0-9) but wont use the same unicode, as they need a version with more spaces between the number to be mixed with normal kanji.
Sometimes, you need more a transliterator (complete or number only) for that. A good example would be a phone number.