A little help with this xPath?

2023-02-19 14:10 问答作者：

I am getting some info from an RSS.

<?php
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->load('http://www.myrss.com');
libxml_clear_errors();

$xPath = new DOMXPath($dom);
$links = $xPath->query('xxxxx');
foreach($links as $link) {
    printf("%s \n", $link->nodeValue);
}
?>

I have managed to get the TITLE, LINK and DESCRIPTION with //item/title and so on, howver I want to get the text content and image of description seperated.

As I can see through page source using firefox this is the code I see for image and the content. Both are in <description></description>

IMAGE

<div class="separator" style="clear: both; text-align: center;"><a href="LINK TO IMAGE" imageanchor="1" 
style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;">开发者_StackOverflow社区<img border="0" height="192" 
src="LINK TO IMAGE" width="320" /></a></div>

CONTENT TEXT

<span class="Apple-style-span" style="font-family: 'Trebuchet MS', sans-serif;"> CONTENT TEXT IS HERE </span>

What xPath should I use to get those data? Thank you

If it is what it looks like and the content is HTML-encoded, you can't do it in one step. You must retrieve every description text and parse into its own DOM (unless you want to resort to regex, which I would strongly discourage).

When in doubt, you can pass it through Tidy before. DOMDocument has loadHTML(), which is pretty resilient, but it is not guaranteed that it can load any HTML.

// beware, this is untested. it should give you an idea, though.

$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);

$dom->load('http://www.myrss.com');
libxml_clear_errors();

$xPath = new DOMXPath($dom);
$items = $xPath->query('/rss/channel/item');

foreach($items as $item) {
    $descr = $xPath->query('./description', $item);
    // there should be at most one, but foreach gracefully
    // handles the case where there is no <description>
    foreach ($descr as $d) {
        $temp_dom = new DOMDocument();
        $temp_dom->loadHTML( $d->nodeValue );   // error handling/Tidy here!

        $temp_xpath = new DOMXPath($temp_dom);

        $img = $temp_xpath->query('//img');
        $txt = $temp_xpath->query('//span[@class="Apple-style-span"]');

        // now do something with $img and $txt
    }

}

Your code didn't format correctly so it would be hard for others to work on it.

However, the interactive tool here: http://www.bubasoft.net/ (XPath Builder) is very helpful when constructing XPath queries.

It looks like the content is encoded/escaped so you can't query it with Xpath as it isn't HTML/XML. Take at htmlentities and html_entity_decode

You should extract the content, convert it to HTML/XML en load it into a DOM Document separately. Then you can query it using Xpath.

继续阅读：dom php

A little help with this xPath?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？