Skip to content
  • About
    • What is Symfony?
    • Community
    • News
    • Contributing
    • Support
  • Documentation
    • Symfony Docs
    • Symfony Book
    • Screencasts
    • Symfony Bundles
    • Symfony Cloud
    • Training
  • Services
    • SensioLabs Professional services to help you with Symfony
    • Platform.sh for Symfony Best platform to deploy Symfony apps
    • SymfonyInsight Automatic quality checks for your apps
    • Symfony Certification Prove your knowledge and boost your career
    • Blackfire Profile and monitor performance of your apps
  • Other
  • Blog
  • Download
sponsored by SensioLabs
  1. Home
  2. Documentation
  3. Components
  4. The DomCrawler Component
  • Documentation
  • Book
  • Reference
  • Bundles
  • Cloud

Table of Contents

  • Installation
  • Usage
    • Node Filtering
    • Node Traversing
    • Accessing Node Values
    • Adding the Content
    • Form and Link support

The DomCrawler Component

Edit this page

Warning: You are browsing the documentation for Symfony 2.0, which is no longer maintained.

Read the updated version of this page for Symfony 6.2 (the current stable version).

The DomCrawler Component

The DomCrawler Component eases DOM navigation for HTML and XML documents.

Note

While possible, the DomCrawler component is not designed for manipulation of the DOM or re-dumping HTML/XML.

Installation

You can install the component in many different ways:

  • Use the official Git repository (https://github.com/symfony/DomCrawler);
  • Install it via Composer (symfony/dom-crawler on Packagist).

Usage

The Crawler class provides methods to query and manipulate HTML and XML documents.

An instance of the Crawler represents a set (SplObjectStorage) of DOMElement objects, which are basically nodes that you can traverse easily:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
use Symfony\Component\DomCrawler\Crawler;

$html = <<<'HTML'
<!DOCTYPE html>
<html>
    <body>
        <p class="message">Hello World!</p>
        <p>Hello Crawler!</p>
    </body>
</html>
HTML;

$crawler = new Crawler($html);

foreach ($crawler as $domElement) {
    print $domElement->nodeName;
}

Specialized Link and Form classes are useful for interacting with html links and forms as you traverse through the HTML tree.

Node Filtering

Using XPath expressions is really easy:

1
$crawler = $crawler->filterXPath('descendant-or-self::body/p');

Tip

DOMXPath::query is used internally to actually perform an XPath query.

Filtering is even easier if you have the CssSelector Component installed. This allows you to use jQuery-like selectors to traverse:

1
$crawler = $crawler->filter('body > p');

Anonymous function can be used to filter with more complex criteria:

1
2
3
4
$crawler = $crawler->filter('body > p')->reduce(function ($node, $i) {
    // filter even nodes
    return ($i % 2) == 0;
});

To remove a node the anonymous function must return false.

Note

All filter methods return a new Crawler instance with filtered content.

Node Traversing

Access node by its position on the list:

1
$crawler->filter('body > p')->eq(0);

Get the first or last node of the current selection:

1
2
$crawler->filter('body > p')->first();
$crawler->filter('body > p')->last();

Get the nodes of the same level as the current selection:

1
$crawler->filter('body > p')->siblings();

Get the same level nodes after or before the current selection:

1
2
$crawler->filter('body > p')->nextAll();
$crawler->filter('body > p')->previousAll();

Get all the child or parent nodes:

1
2
$crawler->filter('body')->children();
$crawler->filter('body > p')->parents();

Note

All the traversal methods return a new Crawler instance.

Accessing Node Values

Access the value of the first node of the current selection:

1
$message = $crawler->filterXPath('//body/p')->text();

Access the attribute value of the first node of the current selection:

1
$class = $crawler->filterXPath('//body/p')->attr('class');

Extract attribute and/or node values from the list of nodes:

1
2
3
4
$attributes = $crawler
    ->filterXpath('//body/p')
    ->extract(array('_text', 'class'))
;

Note

Special attribute _text represents a node value.

Call an anonymous function on each node of the list:

1
2
3
$nodeValues = $crawler->filter('p')->each(function ($node, $i) {
    return $node->text();
});

The anonymous function receives the position and the node as arguments. The result is an array of values returned by the anonymous function calls.

Adding the Content

The crawler supports multiple ways of adding the content:

1
2
3
4
5
6
7
8
9
10
$crawler = new Crawler('<html><body /></html>');

$crawler->addHtmlContent('<html><body /></html>');
$crawler->addXmlContent('<root><node /></root>');

$crawler->addContent('<html><body /></html>');
$crawler->addContent('<root><node /></root>', 'text/xml');

$crawler->add('<html><body /></html>');
$crawler->add('<root><node /></root>');

Note

When dealing with character sets other than ISO-8859-1, always add HTML content using the addHTMLContent() method where you can specify the second parameter to be your target character set.

As the Crawler's implementation is based on the DOM extension, it is also able to interact with native DOMDocument, DOMNodeList and DOMNode objects:

1
2
3
4
5
6
7
8
9
10
$document = new \DOMDocument();
$document->loadXml('<root><node /><node /></root>');
$nodeList = $document->getElementsByTagName('node');
$node = $document->getElementsByTagName('node')->item(0);

$crawler->addDocument($document);
$crawler->addNodeList($nodeList);
$crawler->addNodes(array($node));
$crawler->addNode($node);
$crawler->add($document);

Manipulating and Dumping a Crawler

These methods on the Crawler are intended to initially populate your Crawler and aren't intended to be used to further manipulate a DOM (though this is possible). However, since the Crawler is a set of DOMElement objects, you can use any method or property available on DOMElement, DOMNode or DOMDocument. For example, you could get the HTML of a Crawler with something like this:

1
2
3
4
5
$html = '';

foreach ($crawler as $domElement) {
    $html .= $domElement->ownerDocument->saveHTML($domElement);
}

Form and Link support

Special treatment is given to links and forms inside the DOM tree.

Links

To find a link by name (or a clickable image by its alt attribute), use the selectLink method on an existing crawler. This returns a Crawler instance with just the selected link(s). Calling link() gives you a special Link object:

1
2
3
4
5
$linksCrawler = $crawler->selectLink('Go elsewhere...');
$link = $linksCrawler->link();

// or do this all at once
$link = $crawler->selectLink('Go elsewhere...')->link();

The Link object has several useful methods to get more information about the selected link itself:

1
2
// return the proper URI that can be used to make another request
$uri = $link->getUri();

Note

The getUri() is especially useful as it cleans the href value and transforms it into how it should really be processed. For example, for a link with href="#foo", this would return the full URI of the current page suffixed with #foo. The return from getUri() is always a full URI that you can act on.

Forms

Special treatment is also given to forms. A selectButton() method is available on the Crawler which returns another Crawler that matches a button (input[type=submit], input[type=image], or a button) with the given text. This method is especially useful because you can use it to return a Form object that represents the form that the button lives in:

1
2
3
4
5
6
$form = $crawler->selectButton('validate')->form();

// or "fill" the form fields with data
$form = $crawler->selectButton('validate')->form(array(
    'name' => 'Ryan',
));

The Form object has lots of very useful methods for working with forms:

1
2
3
$uri = $form->getUri();

$method = $form->getMethod();

The getUri() method does more than just return the action attribute of the form. If the form method is GET, then it mimics the browser's behavior and returns the action attribute followed by a query string of all of the form's values.

You can virtually set and get values on the form:

1
2
3
4
5
6
7
8
9
10
11
12
// set values on the form internally
$form->setValues(array(
    'registration[username]' => 'symfonyfan',
    'registration[terms]'    => 1,
));

// get back an array of values - in the "flat" array like above
$values = $form->getValues();

// returns the values like PHP would see them,
// where "registration" is its own array
$values = $form->getPhpValues();

To work with multi-dimensional fields:

1
2
3
4
5
<form>
    <input name="multi[]" />
    <input name="multi[]" />
    <input name="multi[dimensional]" />
</form>

Pass an array of values:

1
2
3
4
5
6
7
8
// Set a single field
$form->setValues(array('multi' => array('value')));

// Set multiple fields at once
$form->setValues(array('multi' => array(
    1             => 'value',
    'dimensional' => 'an other value'
)));

This is great, but it gets better! The Form object allows you to interact with your form like a browser, selecting radio values, ticking checkboxes, and uploading files:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$form['registration[username]']->setValue('symfonyfan');

// check or uncheck a checkbox
$form['registration[terms]']->tick();
$form['registration[terms]']->untick();

// select an option
$form['registration[birthday][year]']->select(1984);

// select many options from a "multiple" select or checkboxes
$form['registration[interests]']->select(array('symfony', 'cookies'));

// even fake a file upload
$form['registration[photo]']->upload('/path/to/lucas.jpg');

What's the point of doing all of this? If you're testing internally, you can grab the information off of your form as if it had just been submitted by using the PHP values:

1
2
$values = $form->getPhpValues();
$files = $form->getPhpFiles();

If you're using an external HTTP client, you can use the form to grab all of the information you need to create a POST request for the form:

1
2
3
4
5
6
$uri = $form->getUri();
$method = $form->getMethod();
$values = $form->getValues();
$files = $form->getFiles();

// now use some HTTP client and post using this information

One great example of an integrated system that uses all of this is Goutte. Goutte understands the Symfony Crawler object and can use it to submit forms directly:

1
2
3
4
5
6
7
8
9
10
11
12
13
use Goutte\Client;

// make a real request to an external site
$client = new Client();
$crawler = $client->request('GET', 'https://github.com/login');

// select the form and fill in some values
$form = $crawler->selectButton('Log in')->form();
$form['login'] = 'symfonyfan';
$form['password'] = 'anypass';

// submit that form
$crawler = $client->submit($form);
This work, including the code samples, is licensed under a Creative Commons BY-SA 3.0 license.
TOC
    Version
    We stand with Ukraine.
    Version:
    Symfony Code Performance Profiling

    Symfony Code Performance Profiling

    Be trained by SensioLabs experts (2 to 6 day sessions -- French or English).

    Be trained by SensioLabs experts (2 to 6 day sessions -- French or English).

    Symfony footer

    ↓ Our footer now uses the colors of the Ukrainian flag because Symfony stands with the people of Ukraine.

    Avatar of Jeroen v.d. Gulik, a Symfony contributor

    Thanks Jeroen v.d. Gulik (@jeroen) for being a Symfony contributor

    1 commit • 4 lines changed

    View all contributors that help us make Symfony

    Become a Symfony contributor

    Be an active part of the community and contribute ideas, code and bug fixes. Both experts and newcomers are welcome.

    Learn how to contribute

    Symfony™ is a trademark of Symfony SAS. All rights reserved.

    • What is Symfony?

      • Symfony at a Glance
      • Symfony Components
      • Case Studies
      • Symfony Releases
      • Security Policy
      • Logo & Screenshots
      • Trademark & Licenses
      • symfony1 Legacy
    • Learn Symfony

      • Symfony Docs
      • Symfony Book
      • Reference
      • Bundles
      • Best Practices
      • Training
      • eLearning Platform
      • Certification
    • Screencasts

      • Learn Symfony
      • Learn PHP
      • Learn JavaScript
      • Learn Drupal
      • Learn RESTful APIs
    • Community

      • SymfonyConnect
      • Support
      • How to be Involved
      • Code of Conduct
      • Events & Meetups
      • Projects using Symfony
      • Downloads Stats
      • Contributors
      • Backers
    • Blog

      • Events & Meetups
      • A week of symfony
      • Case studies
      • Cloud
      • Community
      • Conferences
      • Diversity
      • Documentation
      • Living on the edge
      • Releases
      • Security Advisories
      • SymfonyInsight
      • Twig
      • SensioLabs
    • Services

      • SensioLabs services
      • Train developers
      • Manage your project quality
      • Improve your project performance
      • Host Symfony projects

      Deployed on

    Follow Symfony

    Search by Algolia