A little help with this xPath?
I am getting some info from an RSS.
<?php
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->load('http://www.myrss.com');
libxml_clear_errors();
$xPath = new DOMXPath($dom);
$links = $xPath->query('xxxxx');
foreach($links as $link) {
printf("%s \n", $link->nodeValue);
}
?>
I have managed to get the TITLE, LINK and DESCRIPTION with //item/title
and so on, howver I want to get the text content and image of description seperated.
As I can see through page source using firefox this is the code I see for image and the content. Both are in <description></description>
IMAGE
<div class="separator" style="clear: both; text-align: center;"><a href="LINK TO IMAGE" imageanchor="1"
style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;">开发者_StackOverflow社区<img border="0" height="192"
src="LINK TO IMAGE" width="320" /></a></div>
CONTENT TEXT
<span class="Apple-style-span" style="font-family: 'Trebuchet MS', sans-serif;"> CONTENT TEXT IS HERE </span>
What xPath should I use to get those data? Thank you
If it is what it looks like and the content is HTML-encoded, you can't do it in one step. You must retrieve every description text and parse into its own DOM (unless you want to resort to regex, which I would strongly discourage).
When in doubt, you can pass it through Tidy before. DOMDocument
has loadHTML()
, which is pretty resilient, but it is not guaranteed that it can load any HTML.
// beware, this is untested. it should give you an idea, though.
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->load('http://www.myrss.com');
libxml_clear_errors();
$xPath = new DOMXPath($dom);
$items = $xPath->query('/rss/channel/item');
foreach($items as $item) {
$descr = $xPath->query('./description', $item);
// there should be at most one, but foreach gracefully
// handles the case where there is no <description>
foreach ($descr as $d) {
$temp_dom = new DOMDocument();
$temp_dom->loadHTML( $d->nodeValue ); // error handling/Tidy here!
$temp_xpath = new DOMXPath($temp_dom);
$img = $temp_xpath->query('//img');
$txt = $temp_xpath->query('//span[@class="Apple-style-span"]');
// now do something with $img and $txt
}
}
Your code didn't format correctly so it would be hard for others to work on it.
However, the interactive tool here: http://www.bubasoft.net/ (XPath Builder) is very helpful when constructing XPath queries.
It looks like the content is encoded/escaped so you can't query it with Xpath as it isn't HTML/XML. Take at htmlentities and html_entity_decode
You should extract the content, convert it to HTML/XML en load it into a DOM Document separately. Then you can query it using Xpath.
精彩评论