Fast and easy to use html dom parser written in PHP. It's build on top of php DOMDocument
Require PHP 5.3+
Simply include the class in this classic way :
require_once('Html_dom.php');
Then load a dom document like this :
$html_dom = file_get_html('index.html');
You can also load a html string directly :
$html_dom = str_get_html('<ul><li>item 1</li><li>item 2</li><li>item 3</li></ul>');
Once you have the document loaded you can parse it, modify it and output the modified version.
You can output the document using the save() method :
echo $html_dom->save();
You can also save the output in a file directly if you specify the file path :
$html_dom->save('/path/to/file.html');
Parsing a document can be done with diffrent methods. The fastest one, if you have the element id, is getElementById(). The second fastest one is probably getElementsByTagName(). Finally, the general one where you can pass all kinds of selector is find().
$contentElement = $html_dom->getElementById('content');
$liElementCollection = $html_dom->getElementsByTagName('li');
$secondLiElement = $html_dom->getElementsByTagName('li', 1);
$pElementCollection = $html_dom->find('p'); // array of all the "<p>" elements
$pElement = $html_dom->find('p', 0); // first "<p>" element
$pElement = $html_dom->find('p', 1); // second "<p>" element
$elementCollection = $html_dom->find('div.promo'); // array of DOM element "<div>" with attribute class="promo"
$element = $html_dom->find('#login', 0); // DOM element with attribute id="login"
$element = $html_dom->find('meta[name="description"]', 0); // DOM meta element with attribute name="description"
$element = $html_dom->find('ul', 0)->first_child(); // first child element under "<ul>" (sould be the first "<li>" element)
$element = $html_dom->find('ul', 0)->last_child(); // last child element under "<ul>" (sould be the last "<li>" element)
$liElementCollection = $html_dom->find('ul li'); // array of dom elements
$element = $html_dom->find('ul li')->offsetGet(2); // third element in the array
Once we have a Html_dom_node or a Html_dom_node_collection, we can retrieve some data.
$ul_content = $html_dom->find('ul', 0)->innertext; // content of first "<ul>" element
$li_content = $html_dom->find('ul li', 1)->innertext; // content of second "<li>" element
$attrValue = $html_dom->find('a', 0)->href; // value of "href" attribute
$attrValue = $html_dom->find('a', 0)->my_custom_attribute; // value of "my_custom_attribute" attribute (will work for any attribute)
You can modify the content of a Html_node or modify its attributes.
$html_dom->find('h1', 0)->innertext = 'New H1 title'; // replace H1 title
$html_dom->find('h1', 0)->innertext .= '!!!'; // add exclamations mark to H1 title
$html_dom->find('.menu_item')->addClass('class_test'); // find all the elements with class "menu_item" and add the class "class_test"
$html_dom->find('.menu_item')->class = 'class_test'; // find all the elements with class "menu_item" and replace the class by "class_test"
$html_dom->find('ul li')->removeClass('menu_item'); // find all the "<li>" elements under "<ul>" and remove the class "menu_item"
$html_dom->find('ul li', 0)->hasClass('menu_item'); // find the first "<li>" element under "<ul>" and verify if it has the class "menu_item" (return true or false)
// once you made some modifications, don't forget to output the results
echo $html_dom->save();
loadHTML(string $str)
loadHTMLFile(string $file_path)
setBasicAuth(string $username, string $password)
Example :
$html_dom = new Html_dom();
$html_dom->setBasicAuth('username', 'secret_password');
$html_dom->loadHTMLFile('/path/to/file.html')
getElementById(string $elementId)
getElementsByTagName(string $tagName[, int $index])
save(string $file_path)
find(string $selector[, int $index])
Let's assume that we have a code that start with this
$html_dom = file_get_html('index.html');
$html_dom_node = $html_dom->getElementById('content');
$html_dom_node->getTag();
OR
$html_dom_node->tag;
$html_dom_node->getInnerText();
OR
$html_dom_node->innertext;
$html_dom_node->getOuterText();
OR
$html_dom_node->outertext;
$html_dom_node->getAttr(string $attributeName)
OR
$html_dom_node->attribute_name;
Examples :
$html_dom_node->class;
$html_dom_node->id;
$html_dom_node->href;
$html_dom_node->title;
$html_dom_node->my_custom_attribute;
$html_dom_node->setInnerText($value);
OR
$html_dom_node->innertext = $value;
$html_dom_node->setOuterText($value);
OR
$html_dom_node->outertext = $value;
$html_dom_node->append($value);
$html_dom_node->prepend($value);
$html_dom_node->addClass($class_name);
$html_dom_node->removeClass($class_name);
$html_dom_node->hasClass($class_name);
$html_dom_node->setAttr($attributeName, $value);
OR
$html_dom_node->attribute_name = $value;
Examples :
$html_dom_node->class = 'my_class';
$html_dom_node->id = 'element_id';
$html_dom_node->href = 'www.example.com';
$html_dom_node->title = 'My title';
$html_dom_node->my_custom_attribute = 'my_custom_value';
$html_dom_node->removeAttr($attributeName)
$firstChildElement = $html_dom_node->first_child();
$lastChildElement = $html_dom_node->last_child();
$previousElement = $html_dom_node->previous_sibling();
$nextElement = $html_dom_node->next_sibling();
$elementCollection = $html_dom_node->children();
$elementCollection = $html_dom_node->siblings();
$parentElement = $html_dom_node->parent();
$elementCollection = $html_dom_node->find('li');
$element = $html_dom_node->find('li', 0);
$element = $html_dom_node->getElementById('content');
$elementCollection = $html_dom_node->getElementsByTagName('li');
$element = $html_dom_node->getElementsByTagName('li', 0);
$html_dom_node->remove();
$html_dom_node->remove_childs()
This Class extends ArrayObject, so all the methods available with ArrayObject can be used here. PHP ArrayObject
Here is a list of the most common methods you might need.
$html_dom_node_collection->count();
$html_dom_node_collection->offsetExists(mixed $index);
$html_dom_node_collection->offsetGet($index);
$html_dom_node_collection->offsetSet($index, $value);
$html_dom_node_collection->offsetUnset($index);
You can also iterate in the array using the following methods
seek()
$html_dom_node_collection->seek();
rewind()
$html_dom_node_collection->rewind();
next()
$html_dom_node_collection->next();
current() // return the current Html_node
$html_dom_node_collection->current();
valid() // return a boolean
$html_dom_node_collection->valid();
the examples below assume the we have loaded a document into $html_dom
$html_dom->find('ul li')->addClass('li_class'); // Will add the class "li_class" to all the "<li>" items
$html_dom->find('ul li')->removeClass('li_class'); // Will remove the class "li_class" to all the "<li>" items