Fabien Potencier
Contributed by Fabien Potencier in #4543 , #4550 , #4578 and #4775

In 1973, Vaughan Pratt published "Top Down Operator Precedence", a paper describing a parsing technique so simple it almost feels like cheating. Half a century later, that algorithm is the new heart of Twig. If you maintain an extension that defines operators, Twig 4.0 changes your world: getOperators() is gone, replaced by getExpressionParsers(). And even if you don't, stick around for the algorithm; it is one of my favorite pieces of code in the whole codebase.

The Problem: A Monolith and a Magic Array

In Twig 3, an extension declared its operators through getOperators(), which returned two nested arrays with a very specific shape:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public function getOperators(): array
{
    return [
        // unary operators
        [
            'not' => ['precedence' => 70, 'class' => NotUnary::class],
        ],
        // binary operators
        [
            '~' => [
                'precedence' => 27,
                'class' => ConcatBinary::class,
                'associativity' => ExpressionParser::OPERATOR_LEFT,
            ],
        ],
    ];
}

Notice what is missing: behavior. The array describes operators, but the code that parses them lived somewhere else, in the Twig\ExpressionParser class: 1,033 lines handling every operator of the language. And not only operators: filters (|), attribute access (.), subscripts ([), function calls ((), the ternary operator, is tests, arrow functions... all of them were pseudo-operators hardcoded in that one sprawling class. An extension could add an operator as data, but it could never change how one parses. The grammar was closed.

Operators Are Now Objects

In Twig 4, getExpressionParsers() returns a list of objects, each one owning everything about its operator: name, precedence, associativity, and the parsing logic itself. Here is a complete extension adding a repeat operator that repeats a string:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
use Twig\Extension\AbstractExtension;
use Twig\ExpressionParser\Infix\BinaryOperatorExpressionParser;

final class StringToolsExtension extends AbstractExtension
{
    public function getExpressionParsers(): array
    {
        return [
            new BinaryOperatorExpressionParser(
                RepeatBinary::class, 'repeat', 30,
            ),
        ];
    }
}

The node class compiles to a str_repeat() call:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
use Twig\Compiler;
use Twig\Node\Expression\Binary\AbstractBinary;

final class RepeatBinary extends AbstractBinary
{
    public function compile(Compiler $compiler): void
    {
        $compiler
            ->raw('str_repeat(')
            ->subcompile($this->getNode('left'))
            ->raw(', ')
            ->subcompile($this->getNode('right'))
            ->raw(')');
    }
}

And it works like any built-in operator, precedence included:

1
2
{{ '-=' repeat 5 }}      {# outputs -=-=-=-=-= #}
{{ 'ab' repeat 2 ~ '!' }} {# outputs abab! #}

The second line is the interesting one: repeat has precedence 30, concatenation has 27, so the repetition binds tighter. No special case anywhere; the numbers decide.

One Loop to Parse Them All

Now, the part I really want to show you. Pratt's idea splits the world in two: prefix parsers know how to start an expression (a literal, a unary -, not, an opening parenthesis), and infix parsers know how to extend one (+, |, ., a function call). Each parser carries a precedence. The whole algorithm is one method, lightly edited here for brevity:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public function parseExpression(int $precedence = 0): AbstractExpression
{
    $token = $this->getCurrentToken();
    if ($parser = $this->getPrefixParser($token)) {
        $this->getStream()->next();
        $expr = $parser->parse($this, $token);
    } else {
        $expr = $this->parseLiteral($token);
    }

    while (($parser = $this->getInfixParser($this->getCurrentToken()))
        && $parser->getPrecedence() >= $precedence
    ) {
        $token = $this->getCurrentToken();
        $this->getStream()->next();
        $expr = $parser->parse($this, $expr, $token);
    }

    return $expr;
}

That's it. A binary operator's parse() method is two lines: parse my right side, build my node:

1
2
3
4
5
6
7
8
9
10
11
public function parse(
    Parser $parser, AbstractExpression $left, Token $token,
): AbstractExpression {
    $right = $parser->parseExpression(
        $this->isLeftAssociative()
            ? $this->getPrecedence() + 1
            : $this->getPrecedence()
    );

    return new ($this->nodeClass)($left, $right, $token->getLine());
}

Let's walk through 1 + 2 * 3. The outer call parses 1, sees + (precedence 30), and lets it parse the right side with a minimum precedence of 31. That inner call parses 2, sees * (precedence 60, which is ≥ 31), and recurses again: 2 * 3 becomes a node, the inner call returns it, and + builds 1 + (2 * 3). Now flip the operands: in 1 * 2 + 3, the inner call parses 2, sees + (precedence 30, which is < 61), and stops right there; the outer loop picks + up and builds (1 * 2) + 3. Two integer comparisons, and precedence falls out for free.

Associativity is the + 1 in the snippet above. Left-associative operators ask for strictly higher precedence on their right, so 8 - 2 - 1 becomes (8 - 2) - 1. Right-associative operators ask for their own precedence, so 2 ** 3 ** 2 becomes 2 ** (3 ** 2), which is 512. One character of code, and that is the entire theory of associativity.

Everything Is an Expression Parser

The first version of this work turned operators into objects and kept the rest of the monolith untouched. Then it clicked: operators are a subset of expression parsers. A filter is an infix parser for |. Attribute access is an infix parser for .. A function call is an infix parser for ( applied to a name. The ternary operator, ??, is, arrow functions: parsers, parsers, parsers. Even the humble parenthesized group is a prefix parser for (.

Treating a function call as a binary operator raised a few eyebrows during code review, and I understand why; no mainstream language describes ( that way. But once every construct is a parser with a precedence, they all play by the same rules: one registry, one precedence table, one documentation page, and an extension API that can express any of them. The Twig\ExpressionParser monolith had nothing left to do, and Twig 4.0 deletes it.

One refinement came out of real-world testing: the lexer needs to know which token strings are operators, and a parser's name is not always one of them. When the internal literal parser shipped, its name suddenly clashed with a Craft CMS filter named literal, breaking templates using 'foo'|literal. The fix is the getOperatorTokens() method: each parser declares the token strings it owns, and parsers like literal that own none return an empty list, leaving the name free for your filters.

The Upgrade Path

Twig 3.21 deprecated getOperators(), with extensions still using it triggering "Extension "App\Twig\StringToolsExtension" uses the old signature for "getOperators()", please implement "getExpressionParsers()" instead.". The new method already works on 3.21+, so you can migrate your extensions today and they will run unmodified on both majors.

Published in #Living on the edge #Twig