Symfony 6 Certification New exam with updated questions 100% online Show your expertise

New in Symfony 4.3: Better HTML5 parser for DomCrawler

Warning: This post is about an unsupported Symfony version. Some of this information may be out of date. Read the most recent Symfony Docs.

Contributed by
Titouan Galopin
in #29306 and #30892.

The DomCrawler component eases DOM navigation for HTML and XML documents, making it very useful for functional tests and web scrapers. Internally, this component uses the PHP DOM extension (and methods such as loadHtml()) to parse HTML contents, including HTML5.

Sadly, HTML5 support in PHP DOM extension is far from perfect and it includes some inconsistencies. In contrast, the third-party HTML5-PHP library provides a standards-compliant HTML5 parser and writer written entirely in PHP. Moreover it's been battle-tested in projects such as Drupal and has more than 7 million downloads.

That's why in Symfony 4.3 we've decided to optionally integrate this library in DomCrawler. If you don't make any change in your app, the component will keep using the PHP DOM extension. However, if you install the HTML5-PHP library in your app as follows:

$ composer require masterminds/html5

If the parsed content is HTML5 (it starts with <!doctype html>), then the DomCrawler component will use that library automatically.

Help the Symfony project!

As with any Open-Source project, contributing code or documentation is the most common way to help, but we also have a wide range of sponsoring opportunities.

New in Symfony 4.3: Better HTML5 parser for DomCrawler

Tweet this


I'm wondering what those inconsistencies are ... ?
@Thomas : for instance
Thank you, what would be the exact benefit?
good idea !
Awesome! I'm pretty sure, I ran into that bug in various test without noticing. Good to hear that these are fixed!
awesome awesome ..

Comments are closed.

To ensure that comments stay relevant, they are closed for old posts.