PHP: Fetch content from a html page using xpath()

2023-01-20 04:45 问答作者：

I'm trying to fetch the content of a div in a html page using xpath and domdocument. This is the structure of the page:

<div id="content">
<div class="div1"></div>
<span class="span1></span>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<div class="div2"></div>
</div>

I want to get only the content of p, not spans and divs. I came thru this xpath expression .//*[@id='content']/p but guess something's not right because i'm getting only the first p. Tried using other expression with following-sibling and node() but all return the first p only.

.//*[@id='content']/span/following-sibling::p
.//*[@id='content']/node()[self::p]

This is how's used xpath:

$domDocument=new DOMDocument();
$domDocument->encoding = 'UFT8';
$domDocument->loadHTML($page);
$domXPath = new DOMXPath($domDocument);
$domNodeL开发者_运维百科ist = $domXPath->query($this->xpath);
$content = $this->GetHTMLFromDom($domNodeList);

And this is how i get html from nodes:

private function GetHTMLFromDom($domNodeList){
$domDocument = new DOMDocument();
$node = $domNodeList->item(0);   
 foreach($node->childNodes as $childNode)
 $domDocument->appendChild($domDocument->importNode($childNode, true));
return $domDocument->saveHTML();
}

This XPath expression:

//div[@id='content']/p

Result in the wanted node set (five p elements)

EDIT: Now it's clear what is your problem. You need to iterate over the NodeList:

private function GetHTMLFromDom($domNodeList){ 
   $domDocument = new DOMDocument(); 
   foreach ($nodelist as $node) {
      $domDocument->appendChild($domDocument->importNode($node, true)); 
   }
   return $domDocument->saveHTML(); 
}

继续阅读：domdocument php

PHP: Fetch content from a html page using xpath()

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？