The DomCrawler component eases DOM navigation for HTML and XML documents. Although it's commonly used for developing functional tests in Symfony2 applications, it can also be used to scrape contents, as demonstrated by the Goutte project.
DomCrawler provides several methods to perform node filtering: filter()
,
reduce()
and each()
. As of Symfony 2.6, you can use another handy
method called slice()
.
Similarly to the array_slice()
PHP function, the new slice($offset, $length)
function returns the sequence of elements specified by the offset and length
parameters. Consider for example the code needed to extract the text content
of some <li>
elements from a #nav-menu
element:
1 2 3 4 5 6 7 8 9
use Symfony\Component\DomCrawler\Crawler;
$crawler = new Crawler(' ... some HTML content ... ');
$crawler->filter('#nav-menu li')->each(function ($node, $i) {
if ($i >= 2 && $i <= 7) {
return $node->text();
}
});
In Symfony 2.6, the previous node filtering code becomes more more simplified and cleaner:
1 2 3
$crawler->filter('#nav-menu li')->slice(2, 7)->each(function ($node, $i) {
return $node->text();
});
This new Symfony 2.6 feature is just another example of how minor tweaks can make your job easier. We are strongly committed to improving each and every Symfony feature. That's why we've introduced the DX initiative. Help us continue improving Symfony by sending us your comments and ideas. Your opinion matters to us!
nice feature!