-
Notifications
You must be signed in to change notification settings - Fork 1.8k
HTML5 Parser
With the 2.0.0 release, Dompdf incorporated the Masterminds/HTML5-PHP HTML5 parser library. The HTML5 parser is always enabled when ingesting an HTML document.
Previous releases of Dompdf bundled an older HTML5 parser:
html5lib.
In those releases the HTML5 parser can be activated
by setting \Dompdf\Options::$isHtml5ParserEnabled
to true
.
An HTML parser is a library or software able to read an HTML source code and translate it into a DOM tree.
The difference between a regular HTML parser and an HTML5 parser is that the latter knows how to deal with badly structured HTML code as all the cases are strictly defined in W3C specifications.
Having an HTML5 parser, dompdf will be able to handle more poorly written HTML documents.
For example, it happens that a table
element has rows without closing
tr
elements. A regular HTML parser (the one embedded with the PHP DOM
extension: libxml) won't be able to handle it well and may, for example,
ignore this line or append the next cells to the current line. An HTML5
parser will handle it like if the </tr>
tag is present.
Though not recommended, it is possible to skip HTML5 parsing by feeding
Dompdf a DOMDocument instance instead of an HTML document. To do so, you
would call the loadDom
method with your previously instantiated
DOMDocument instance.
$doc = new DOMDocument("1.0", "UTF-8");
$doc->preserveWhiteSpace = true;
$doc->loadHTMLFile(...);
$doc->encoding = "UTF-8";
$dompdf = new Dompdf();
$dompdf->loadDom($doc);
$dompmdf->render();
$dompdf->stream();