The String Component

5.0 version
Maintained

The String Component

The String component provides a single object-oriented API to work with three "unit systems" of strings: bytes, code points and grapheme clusters.

New in version 5.0: The String component was introduced in Symfony 5.0 as an experimental feature.

Installation

1
$ composer require symfony/string

Note

If you install this component outside of a Symfony application, you must require the vendor/autoload.php file in your code to enable the class autoloading mechanism provided by Composer. Read this article for more details.

What is a String?

You can skip this section if you already know what a "code point" or a "grapheme cluster" are in the context of handling strings. Otherwise, read this section to learn about the terminology used by this component.

Languages like English require a very limited set of characters and symbols to display any content. Each string is a series of characters (letters or symbols) and they can be encoded even with the most limited standards (e.g. ASCII).

However, other languages require thousands of symbols to display their contents. They need complex encoding standards such as Unicode and concepts like "character" no longer make sense. Instead, you have to deal with these terms:

  • Code points: they are the atomic unit of information. A string is a series of code points. Each code point is a number whose meaning is given by the Unicode standard. For example, the English letter A is the U+0041 code point and the Japanese kana is the U+306E code point.
  • Grapheme clusters: they are a sequence of one or more code points which are displayed as a single graphical unit. For example, the Spanish letter ñ is a grapheme cluster that contains two code points: U+006E = n ("latin small letter N") + U+0303 = ◌̃ ("combining tilde").
  • Bytes: they are the actual information stored for the string contents. Each code point can require one or more bytes of storage depending on the standard being used (UTF-8, UTF-16, etc.).

The following image displays the bytes, code points and grapheme clusters for the same word written in English (hello) and Hindi (नमस्ते):

../_images/bytes-points-graphemes.png

Usage

Create a new object of type ByteString, CodePointString or UnicodeString, pass the string contents as their arguments and then use the object-oriented API to work with those strings:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
use Symfony\Component\String\UnicodeString;

$text = (new UnicodeString('This is a déjà-vu situation.'))
    ->trimEnd('.')
    ->replace('déjà-vu', 'jamais-vu')
    ->append('!');
// $text = 'This is a jamais-vu situation!'

$content = new UnicodeString('नमस्ते दुनिया');
if ($content->ignoreCase()->startsWith('नमस्ते')) {
    // ...
}

Method Reference

Methods to Create String Objects

First, you can create objects prepared to store strings as bytes, code points and grapheme clusters with the following classes:

1
2
3
4
5
6
7
8
use Symfony\Component\String\ByteString;
use Symfony\Component\String\CodePointString;
use Symfony\Component\String\UnicodeString;

$foo = new ByteString('hello');
$bar = new CodePointString('hello');
// UnicodeString is the most commonly used class
$baz = new UnicodeString('hello');

Use the wrap() static method to instantiate more than one string object:

1
2
3
4
5
6
7
$contents = ByteString::wrap(['hello', 'world']);        // $contents = ByteString[]
$contents = UnicodeString::wrap(['I', '❤️', 'Symfony']); // $contents = UnicodeString[]

// use the unwrap method to make the inverse conversion
$contents = UnicodeString::unwrap([
    new UnicodeString('hello'), new UnicodeString('world'),
]); // $contents = ['hello', 'world']

There are two shortcut functions called b() and u() to create ByteString and UnicodeString objects:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// ...
use function Symfony\Component\String\b;
use function Symfony\Component\String\u;

// both are equivalent
$foo = new ByteString('hello');
$foo = b('hello');

// both are equivalent
$baz = new UnicodeString('hello');
$baz = u('hello');

There are also some specialized constructors:

1
2
3
4
5
6
// ByteString can create a random string of the given length
$foo = ByteString::fromRandom(12);

// CodePointString and UnicodeString can create a string from code points
$foo = UnicodeString::fromCodePoints(0x928, 0x92E, 0x938, 0x94D, 0x924, 0x947);
// $foo = 'नमस्ते'

Methods to Transform String Objects

Each string object can be transformed into the other two types of objects:

1
2
3
4
5
6
7
8
$foo = ByteString::fromRandom(12)->toCodePointString();
$foo = (new CodePointString('hello'))->toUnicodeString();
$foo = UnicodeString::fromCodePoints(0x68, 0x65, 0x6C, 0x6C, 0x6F)->toByteString();

// the optional $toEncoding argument defines the encoding of the target string
$foo = (new CodePointString('hello'))->toByteString('Windows-1252');
// the optional $fromEncoding argument defines the encoding of the original string
$foo = (new ByteString('さよなら'))->toCodePointString('ISO-2022-JP');

If the conversion is not possible for any reason, you'll get an InvalidArgumentException.

Methods to Change Case

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// changes all graphemes/code points to lower case
u('FOO Bar')->lower();  // 'foo bar'

// when dealing with different languages, uppercase/lowercase is not enough
// there are three cases (lower, upper, title), some characters have no case,
// case is context-sensitive and locale-sensitive, etc.
// this method returns a string that you can use in case-insensitive comparisons
u('FOO Bar')->folded();             // 'foo bar'
u('Die O\'Brian Straße')->folded(); // "die o'brian strasse"

// changes all graphemes/code points to upper case
u('foo BAR')->upper(); // 'FOO BAR'

// changes all graphemes/code points to "title case"
u('foo bar')->title();     // 'Foo bar'
u('foo bar')->title(true); // 'Foo Bar'

// changes all graphemes/code points to camelCase
u('Foo: Bar-baz.')->camel(); // 'fooBarBaz'
// changes all graphemes/code points to snake_case
u('Foo: Bar-baz.')->snake(); // 'foo_bar_baz'
// other cases can be achieved by chaining methods. E.g. PascalCase:
u('Foo: Bar-baz.')->camel()->title(); // 'FooBarBaz'

The methods of all string classes are case-sensitive by default. You can perform case-insensitive operations with the ignoreCase() method:

u('abc')->indexOf('B');               //  null
u('abc')->ignoreCase()->indexOf('B'); // 1

Methods to Append and Prepend

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// adds the given content (one or more strings) at the beginning/end of the string
u('world')->prepend('hello');      // 'helloworld'
u('world')->prepend('hello', ' '); // 'hello world'

u('hello')->append('world');      // 'helloworld'
u('hello')->append(' ', 'world'); // 'hello world'

// adds the given content at the beginning of the string (or removes it) to
// make sure that the content starts exactly with that content
u('Name')->ensureStart('get');       // 'getName'
u('getName')->ensureStart('get');    // 'getName'
u('getgetName')->ensureStart('get'); // 'getName'
// this method is similar, but works on the end of the content instead of on the beginning
u('User')->ensureEnd('Controller');           // 'UserController'
u('UserController')->ensureEnd('Controller'); // 'UserController'
u('UserControllerController')->ensureEnd('Controller'); // 'UserController'

// returns the contents found before/after the first occurrence of the given string
u('hello world')->before('world');   // 'hello '
u('hello world')->before('o');       // 'hell '
u('hello world')->before('o', true); // 'hello'

u('hello world')->after('hello');   // ' world'
u('hello world')->after('o');       // ' world'
u('hello world')->after('o', true); // 'o world'

// returns the contents found before/after the last occurrence of the given string
u('hello world')->beforeLast('o');       // 'hello w'
u('hello world')->beforeLast('o', true); // 'hello wo'

u('hello world')->afterLast('o');       // 'rld'
u('hello world')->afterLast('o', true); // 'orld'

Methods to Pad and Trim

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// makes a string as long as the first argument by adding the given
// string at the beginning, end or both sides of the string
u(' Lorem Ipsum ')->padBoth(20, '-'); // '--- Lorem Ipsum ----'
u(' Lorem Ipsum')->padStart(20, '-'); // '-------- Lorem Ipsum'
u('Lorem Ipsum ')->padEnd(20, '-');   // 'Lorem Ipsum --------'

// repeats the given string the number of times passed as argument
u('_.')->repeat(10); // '_._._._._._._._._._.'

// removes the given characters (by default, white spaces) from the string
u('   Lorem Ipsum   ')->trim(); // 'Lorem Ipsum'
u('Lorem Ipsum   ')->trim('m'); // 'Lorem Ipsum   '
u('Lorem Ipsum')->trim('m');    // 'Lorem Ipsu'

u('   Lorem Ipsum   ')->trimStart(); // 'Lorem Ipsum   '
u('   Lorem Ipsum   ')->trimEnd();   // '   Lorem Ipsum'

Methods to Search and Replace

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// checks if the string starts/ends with the given string
u('https://symfony.com')->startsWith('https'); // true
u('report-1234.pdf')->endsWith('.pdf');        // true

// checks if the string contents are exactly the same as the given contents
u('foo')->equalsTo('foo'); // true

// checks if the string content match the given regular expression
u('avatar-73647.png')->match('/avatar-(\d+)\.png/');
// result = ['avatar-73647.png', '73647']

// finds the position of the first occurrence of the given string
// (the second argument is the position where the search starts and negative
// values have the same meaning as in PHP functions)
u('abcdeabcde')->indexOf('c');     // 2
u('abcdeabcde')->indexOf('c', 2);  // 2
u('abcdeabcde')->indexOf('c', -4); // 7
u('abcdeabcde')->indexOf('eab');   // 4
u('abcdeabcde')->indexOf('k');     // null

// finds the position of the last occurrence of the given string
// (the second argument is the position where the search starts and negative
// values have the same meaning as in PHP functions)
u('abcdeabcde')->indexOfLast('c');     // 7
u('abcdeabcde')->indexOfLast('c', 2);  // 7
u('abcdeabcde')->indexOfLast('c', -4); // 2
u('abcdeabcde')->indexOfLast('eab');   // 4
u('abcdeabcde')->indexOfLast('k');     // null

// replaces all occurrences of the given string
u('http://symfony.com')->replace('http://', 'https://'); // 'https://symfony.com'
// replaces all occurrences of the given regular expression
u('(+1) 206-555-0100')->replaceMatches('/[^A-Za-z0-9]++/', ''); // '12065550100'
// you can pass a callable as the second argument to perform advanced replacements
u('123')->replaceMatches('/\d/', function ($match) {
    return '['.$match[0].']';
}); // result = '[1][2][3]'

Methods to Join, Split and Truncate

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// uses the string as the "glue" to merge all the given strings
u(', ')->join(['foo', 'bar']); // 'foo, bar'

// breaks the string into pieces using the given delimiter
u('template_name.html.twig')->split('.');    // ['template_name', 'html', 'twig']
// you can set the maximum number of pieces as the second argument
u('template_name.html.twig')->split('.', 2); // ['template_name', 'html.twig']

// returns a substring which starts at the first argument and has the length of the
// second optional argument (negative values have the same meaning as in PHP functions)
u('Symfony is great')->slice(0, 7);  // 'Symfony'
u('Symfony is great')->slice(0, -6); // 'Symfony is'
u('Symfony is great')->slice(11);    // 'great'
u('Symfony is great')->slice(-5);    // 'great'

// reduces the string to the length given as argument (if it's longer)
u('Lorem Ipsum')->truncate(80);     // 'Lorem Ipsum'
u('Lorem Ipsum')->truncate(3);      // 'Lor'
u('Lorem Ipsum')->truncate(8, '…'); // 'Lorem I…'

// breaks the string into lines of the given length
u('Lorem Ipsum')->wordwrap(4);             // 'Lorem\nIpsum'
// by default it breaks by white space; pass TRUE to break unconditionally
u('Lorem Ipsum')->wordwrap(4, "\n", true); // 'Lore\nm\nIpsu\nm'

// replaces a portion of the string with the given contents:
// the second argument is the position where the replacement starts;
// the third argument is the number of graphemes/code points removed from the string
u('0123456789')->splice('xxx');       // 'xxx'
u('0123456789')->splice('xxx', 0, 2); // 'xxx23456789'
u('0123456789')->splice('xxx', 0, 6); // 'xxx6789'
u('0123456789')->splice('xxx', 6);    // '012345xxx'

// breaks the string into pieces of the length given as argument
u('0123456789')->chunk(3);  // ['012', '345', '678', '9']

Methods Added by ByteString

These methods are only available for ByteString objects:

1
2
3
4
5
6
7
8
9
// returns TRUE if the string contents are valid UTF-8 contents
b('Lorem Ipsum')->isUtf8(); // true
b("\xc3\x28")->isUtf8();    // false

// returns the value of the byte stored at the given position
// ('नमस्ते' bytes = [224, 164, 168, 224, 164, 174, 224, 164, 184,
//                  224, 165, 141, 224, 164, 164, 224, 165, 135])
b('नमस्ते')->byteCode(0);  // 224
b('नमस्ते')->byteCode(17); // 135

Methods Added by CodePointString and UnicodeString

These methods are only available for CodePointString and UnicodeString objects:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// transliterates any string into the latin alphabet defined by the ASCII encoding
// (don't use this method to build a slugger because this component already provides
// a slugger, as explained later in this article)
u('नमस्ते')->ascii();    // 'namaste'
u('さよなら')->ascii(); // 'sayonara'
u('спасибо')->ascii(); // 'spasibo'

// returns the value of the code point stored at the given position
// (code points of 'नमस्ते' graphemes = [2344, 2350, 2360, 2340]
u('नमस्ते')->codePoint(0); // 2344
u('नमस्ते')->codePoint(2); // 2360

Unicode equivalence is the specification by the Unicode standard that different sequences of code points represent the same character. For example, the Swedish letter å can be a single code point (U+00E5 = "latin small letter A with ring above") or a sequence of two code points (U+0061 = "latin small letter A" + U+030A = "combining ring above"). The normalize() method allows to pick the normalization mode:

1
2
3
4
5
6
// these encode the letter as a single code point: U+00E5
u('å')->normalize(UnicodeString::NFC);
u('å')->normalize(UnicodeString::NFKC);
// these encode the letter as two code points: U+0061 + U+030A
u('å')->normalize(UnicodeString::NFD);
u('å')->normalize(UnicodeString::NFKD);

Slugger

In some contexts, such as URLs and file/directory names, it's not safe to use any Unicode character. A slugger transforms a given string into another string that only includes safe ASCII characters:

1
2
3
4
5
use Symfony\Component\String\Slugger\AsciiSlugger;

$slugger = new AsciiSlugger();
$slug = $slugger->slug('Wôrķšƥáçè ~~sèťtïñğš~~');
// $slug = 'Workspace-settings'

The separator between words is a dash (-) by default, but you can define another separator as the second argument:

$slug = $slugger->slug('Wôrķšƥáçè ~~sèťtïñğš~~', '/');
// $slug = 'Workspace/settings'

The slugger transliterates the original string into the Latin script before applying the other transformations. The locale of the original string is detected automatically, but you can define it explicitly:

1
2
3
4
5
// this tells the slugger to transliterate from Korean language
$slugger = new AsciiSlugger('ko');

// you can override the locale as the third optional parameter of slug()
$slug = $slugger->slug('...', '-', 'fa');

In a Symfony application, you don't need to create the slugger yourself. Thanks to service autowiring, you can inject a slugger by type-hinting a service constructor argument with the SluggerInterface. The locale of the injected slugger is the same as the request locale:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
use Symfony\Component\String\Slugger\SluggerInterface;

class MyService
{
    private $slugger;

    public function __construct(SluggerInterface $slugger)
    {
        $this->slugger = $slugger;
    }

    public function someMethod()
    {
        $slug = $slugger->slug('...');
    }
}

This work, including the code samples, is licensed under a Creative Commons BY-SA 3.0 license.