New in Symfony 2.4: Namespaces auto-discovery in DowCrawler

Contributed by
Jakub Zalas
in #6650.

When crawling an XML document with the DomCrawler component, you might retrieve documents with more than one namespaces:

Note

The DomCrawler component is used by the Symfony HTTP client, but also by some Behat drivers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<?xml version="1.0" encoding="UTF-8"?>
<entry
  xmlns="http://www.w3.org/2005/Atom"
  xmlns:media="http://search.yahoo.com/mrss/"
  xmlns:yt="http://gdata.youtube.com/schemas/2007">
    <id>tag:youtube.com,2008:video:kgZRZmEc9j4</id>
    <yt:accessControl action="comment" permission="allowed"/>
    <yt:accessControl action="videoRespond" permission="moderated"/>
    <media:group>
        <media:title type="plain">Chordates - CrashCourse Biology #24</media:title>
        <yt:aspectRatio>widescreen</yt:aspectRatio>
    </media:group>
</entry>

As of Symfony 2.4, you don't need to care about namespaces, as they auto-discovered and auto-registered:

1
2
3
4
$crawler = $crawler->filterXPath('//default:entry/media:group//yt:aspectRatio');

\Symfony\Component\CssSelector\CssSelector::disableHtmlExtension();
$crawler = $crawler->filter('default|entry media|group yt|aspectRatio');

Notice that the default namespace name is default (configurable) and that you must explicitly disable the HTML extension of the CssSelector component when filtering an XML document with a CSS selector.

Comments

Typo in post title - s/DowCrawler/DomCrawler/g
There is an issue here: it is not used by some Behat drivers (there is no Behat drivers) but by some Mink drivers

Comments are closed.

To ensure that comments stay relevant, they are closed for old posts.