The DomCrawler component eases DOM navigation for HTML and XML documents. Most developers use it in the functional tests of their Symfony applications, but you can use it to build a real crawler.
A common need when building a crawler is to turn the links of the HTML contents,
which are usually relative, into absolute URLs, to keep crawling the entire site.
For example, if the site URL is https://example.com/foo
and the link URL is
../bar?foo=1
, the absolute URL is https://example.com/bar?foo=1
.
This transformation is much more complex than it looks because you have to deal
with anchors, query string parameters and all kinds of sub paths. The DomCrawler
component already contained the logic to resolve these URLs, but in Symfony 5.1
we've extracted it into a new UriResolver
class so you can reuse the logic
in your applications:
1 2 3 4
use Symfony\Component\DomCrawler\UriResolver;
$absoluteUrl = UriResolver::resolve('../bar?foo=1', 'https://example.com/foo');
// $absoluteUrl = 'https://example.com/bar?foo=1'
Very helpful, thank you.