Jakub Zalas
Contributed by Jakub Zalas in #6650

When crawling an XML document with the DomCrawler component, you might retrieve documents with more than one namespaces:

Note

The DomCrawler component is used by the Symfony HTTP client, but also by some Behat drivers.

1
2
3
4
5
6
7
8
9
10
11
12
13
<?xml version="1.0" encoding="UTF-8"?>
<entry
  xmlns="http://www.w3.org/2005/Atom"
  xmlns:media="http://search.yahoo.com/mrss/"
  xmlns:yt="http://gdata.youtube.com/schemas/2007">
    <id>tag:youtube.com,2008:video:kgZRZmEc9j4</id>
    <yt:accessControl action="comment" permission="allowed"/>
    <yt:accessControl action="videoRespond" permission="moderated"/>
    <media:group>
        <media:title type="plain">Chordates - CrashCourse Biology #24</media:title>
        <yt:aspectRatio>widescreen</yt:aspectRatio>
    </media:group>
</entry>

As of Symfony 2.4, you don't need to care about namespaces, as they auto-discovered and auto-registered:

1
2
3
4
$crawler = $crawler->filterXPath('//default:entry/media:group//yt:aspectRatio');

\Symfony\Component\CssSelector\CssSelector::disableHtmlExtension();
$crawler = $crawler->filter('default|entry media|group yt|aspectRatio');

Notice that the default namespace name is default (configurable) and that you must explicitly disable the HTML extension of the CssSelector component when filtering an XML document with a CSS selector.

Published in #Living on the edge