开发者

Create array from the contents of <div> tags in php

I have the contents of a web page assigned to a variable $html

He开发者_如何学Cre's an example of the contents of $html:

<div class="content">something here</div>
<span>something random thrown in <strong>here</strong></span>
<div class="content">more stuff</div>

How, using PHP can I create an array from that that finds the contents of <div class="content"></div> regions like this (for the example above) so:

echo $array[0] . "\n" . $array[1]; //etc

outputs

something here
more stuff


Assuming this is just a simplified case in the OP and the real situation is more complicated, you'll want to use XPath.

If it's really complex, then you may want to use DOMDocument (with DOMXPath), but here's a simple example using SimpleXML

$xml = new SimpleXMLElement($html);

$result = $xml->xpath('//div[@class="content"]');

while(list( , $node) = each($result)) {
    echo $node,"\n";
}

Since you explicitly asked about creating an array for this, you could use:

$res_Arr = array();
while(list( , $node) = each($result)) {
    $res_Arr[] = $node;
}

and $res_Arr would be an array with the contents you're looking for.

See http://php.net/manual/en/simplexmlelement.xpath.php for php SimpleXML Xpath info and http://www.w3.org/TR/xpath for the XPath specifications


PHP has several means of processing HTML, including DomDocument and SimpleXML. See Parse HTML With PHP And DOM. Here is an example:

$dom = new DomDocument; 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$divs = $dom->getElementsByTagName('div'); 
foreach ($divs as $div) {
  $class = $div->getAttribute('class');
  if ($class == 'content') {
    echo $div->nodeValue . "\n";
  }
}

Technically the class attribute could be multiple classes so you might want to use:

$classes = explode(' ', $class);
if (in_array('content', $classes)) {
  ...
}

The SimpleXML/XPath approach is more concise but if you don't want to go the XPath route (and learning another technology, at least enough to do these sorts of tasks) then the above is a programmatic alternative.


There not much you can do short of using string manipulations function or regular expressions. you can load your HTML as XML using the DOM library and use that to traverse to your div, but that can become cumbersome if your not careful or if the structure is complex.

http://ca3.php.net/manual/en/book.dom.php


It looks like Kalem13 beat me to it, but I agree. You could use the DOMDocument class. I haven't used it personally, but I think it would work for you. First you instantiate a DOMDocument object, then you load your $html variable using the loadHTML() function. Then you can use the getElementsByTagName() function.


You probaly need to use preg_match_all()

$matches = array();
preg_match_all('`\<div(.*?)class\=\"content\"(.*?)\>(.*?)\<\/div\>`iUsm',$html,$matches,PREG_SET_ORDER);
foreach($matches as $m){
  // $m[3] represents the content in <div class="content">
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜