Warning: You are browsing the documentation for Symfony 2.0, which is no longer maintained.

Read the updated version of this page for Symfony 5.3 (the current stable version).

The DomCrawler Component

The DomCrawler Component

The DomCrawler Component eases DOM navigation for HTML and XML documents.


While possible, the DomCrawler component is not designed for manipulation of the DOM or re-dumping HTML/XML.


You can install the component in many different ways:


The Symfony\Component\DomCrawler\Crawler class provides methods to query and manipulate HTML and XML documents.

An instance of the Crawler represents a set (SplObjectStorage) of DOMElement objects, which are basically nodes that you can traverse easily:

use Symfony\Component\DomCrawler\Crawler;

$html = <<<'HTML'
<!DOCTYPE html>
        <p class="message">Hello World!</p>
        <p>Hello Crawler!</p>

$crawler = new Crawler($html);

foreach ($crawler as $domElement) {
    print $domElement->nodeName;

Specialized Symfony\Component\DomCrawler\Link and Symfony\Component\DomCrawler\Form classes are useful for interacting with html links and forms as you traverse through the HTML tree.

Node Filtering

Using XPath expressions is really easy:

$crawler = $crawler->filterXPath('descendant-or-self::body/p');


DOMXPath::query is used internally to actually perform an XPath query.

Filtering is even easier if you have the CssSelector Component installed. This allows you to use jQuery-like selectors to traverse:

$crawler = $crawler->filter('body > p');

Anonymous function can be used to filter with more complex criteria:

$crawler = $crawler->filter('body > p')->reduce(function ($node, $i) {
    // filter even nodes
    return ($i % 2) == 0;

To remove a node the anonymous function must return false.


All filter methods return a new Symfony\Component\DomCrawler\Crawler instance with filtered content.

Node Traversing

Access node by its position on the list:

$crawler->filter('body > p')->eq(0);

Get the first or last node of the current selection:

$crawler->filter('body > p')->first();
$crawler->filter('body > p')->last();

Get the nodes of the same level as the current selection:

$crawler->filter('body > p')->siblings();

Get the same level nodes after or before the current selection:

$crawler->filter('body > p')->nextAll();
$crawler->filter('body > p')->previousAll();

Get all the child or parent nodes:

$crawler->filter('body > p')->parents();


All the traversal methods return a new Symfony\Component\DomCrawler\Crawler instance.

Accessing Node Values

Access the value of the first node of the current selection:

$message = $crawler->filterXPath('//body/p')->text();

Access the attribute value of the first node of the current selection:

$class = $crawler->filterXPath('//body/p')->attr('class');

Extract attribute and/or node values from the list of nodes:

$attributes = $crawler
    ->extract(array('_text', 'class'))


Special attribute _text represents a node value.

Call an anonymous function on each node of the list:

$nodeValues = $crawler->filter('p')->each(function ($node, $i) {
    return $node->text();

The anonymous function receives the position and the node as arguments. The result is an array of values returned by the anonymous function calls.

Adding the Content

The crawler supports multiple ways of adding the content:

$crawler = new Crawler('<html><body /></html>');

$crawler->addHtmlContent('<html><body /></html>');
$crawler->addXmlContent('<root><node /></root>');

$crawler->addContent('<html><body /></html>');
$crawler->addContent('<root><node /></root>', 'text/xml');

$crawler->add('<html><body /></html>');
$crawler->add('<root><node /></root>');


When dealing with character sets other than ISO-8859-1, always add HTML content using the addHTMLContent() method where you can specify the second parameter to be your target character set.

As the Crawler’s implementation is based on the DOM extension, it is also able to interact with native DOMDocument, DOMNodeList and DOMNode objects:

$document = new \DOMDocument();
$document->loadXml('<root><node /><node /></root>');
$nodeList = $document->getElementsByTagName('node');
$node = $document->getElementsByTagName('node')->item(0);


This work, including the code samples, is licensed under a Creative Commons BY-SA 3.0 license.