Create array from the contents of <div> tags in php
I have the contents of a web page assigned to a variable $html
He开发者_如何学Cre's an example of the contents of $html
:
<div class="content">something here</div>
<span>something random thrown in <strong>here</strong></span>
<div class="content">more stuff</div>
How, using PHP can I create an array from that that finds the contents of <div class="content"></div>
regions like this (for the example above) so:
echo $array[0] . "\n" . $array[1]; //etc
outputs
something here
more stuff
Assuming this is just a simplified case in the OP and the real situation is more complicated, you'll want to use XPath.
If it's really complex, then you may want to use DOMDocument (with DOMXPath), but here's a simple example using SimpleXML
$xml = new SimpleXMLElement($html);
$result = $xml->xpath('//div[@class="content"]');
while(list( , $node) = each($result)) {
echo $node,"\n";
}
Since you explicitly asked about creating an array for this, you could use:
$res_Arr = array();
while(list( , $node) = each($result)) {
$res_Arr[] = $node;
}
and $res_Arr
would be an array with the contents you're looking for.
See http://php.net/manual/en/simplexmlelement.xpath.php for php SimpleXML Xpath info and http://www.w3.org/TR/xpath for the XPath specifications
PHP has several means of processing HTML, including DomDocument
and SimpleXML
. See Parse HTML With PHP And DOM. Here is an example:
$dom = new DomDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
$class = $div->getAttribute('class');
if ($class == 'content') {
echo $div->nodeValue . "\n";
}
}
Technically the class attribute could be multiple classes so you might want to use:
$classes = explode(' ', $class);
if (in_array('content', $classes)) {
...
}
The SimpleXML/XPath approach is more concise but if you don't want to go the XPath route (and learning another technology, at least enough to do these sorts of tasks) then the above is a programmatic alternative.
There not much you can do short of using string manipulations function or regular expressions. you can load your HTML as XML using the DOM library and use that to traverse to your div, but that can become cumbersome if your not careful or if the structure is complex.
http://ca3.php.net/manual/en/book.dom.php
It looks like Kalem13 beat me to it, but I agree. You could use the DOMDocument class. I haven't used it personally, but I think it would work for you. First you instantiate a DOMDocument object, then you load your $html variable using the loadHTML() function. Then you can use the getElementsByTagName() function.
You probaly need to use preg_match_all
()
$matches = array();
preg_match_all('`\<div(.*?)class\=\"content\"(.*?)\>(.*?)\<\/div\>`iUsm',$html,$matches,PREG_SET_ORDER);
foreach($matches as $m){
// $m[3] represents the content in <div class="content">
}
精彩评论