Grégoire Pineau
Contributed by Grégoire Pineau in #35415 and #35667

The DomCrawler component eases DOM navigation for HTML and XML documents. Most developers use it in the functional tests of their Symfony applications, but you can use it to build a real crawler.

A common need when building a crawler is to turn the links of the HTML contents, which are usually relative, into absolute URLs, to keep crawling the entire site. For example, if the site URL is https://example.com/foo and the link URL is ../bar?foo=1, the absolute URL is https://example.com/bar?foo=1.

This transformation is much more complex than it looks because you have to deal with anchors, query string parameters and all kinds of sub paths. The DomCrawler component already contained the logic to resolve these URLs, but in Symfony 5.1 we've extracted it into a new UriResolver class so you can reuse the logic in your applications:

1
2
3
4
use Symfony\Component\DomCrawler\UriResolver;

$absoluteUrl = UriResolver::resolve('../bar?foo=1', 'https://example.com/foo');
// $absoluteUrl = 'https://example.com/bar?foo=1'
Published in #Living on the edge