Need some help with XPath expression. One works, the other doesn't
I'm using the COBRA HTMLParser but haven't had luck parsing one particular tag. Here's the source:
<li id="eta" class="hentry">
<span class="body">
<span class="actions">
</span>
<span class="content">
</span>
<span class="meta entry">Content here
</span>
<span class="meta entry stub">Content here
<span class="shared-content">
Information by
<a class="title" data="associate" href="/associate">Associate</a>
</span>
</span>
</span>
</li>
I am able to use the following XPaths to get the proper information:
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xpath.evaluate("//span[contains(@class, 'body')]", document, XPathConstants.NODESET);
int length = nodeList.getLength();
System.out.println(nodeList.getLength());
for(int i = 0; i < length; i++) {
Element element = (Element) nodeList.item(i);
NodeList n = null;
try {
n = (NodeList) xpath.evaluate("span[contains(@class, 'content')]", element, XPathConstants.NODESET);
String body = n.item(0).getTextContent();
System.out.println("Content: " + body);
} catch (Exception e) {};
try {
String date = (String) xpath.evaluate("span[contains(@class, 'meta entry')]/a/span/@data", element, XPathConstants.STRING);
System.out.println("DATA: " + date);
String source = (String) xpath.evaluate("//span[contains(@class, 'meta entry')]/span", element, XPathConstants.STRING);
System.out.println("DATA: " + source);
} catch (Exception e) {};
//This does not work at all! I've tried every combination and still can't get it to run
try {
String info = (String) xpath.evaluate("//span[@class='shared-content']/a/@data", element, XPathConstants.STRING);
System.out.println("INFO: " + info);
} catch (Exception e) {};
}
The last expression does not work whatever combination I try. I've tried the following too but it doesn't help,
String info = (String) xpath.evaluate("//span[contains(@class, 'shared-content')]/a/@data", element, XPathConstants.STRING);
String info = (String) xpath.evaluate("//span[contains(@class, 'meta entry info')]/span/a/@data", element, XPathConstants.STRING);
Any suggestions?
EDIT: There have been a couple of suggestions about the XML being illegal (which honestly I am not sure myself as to why it is illegal because I've seen it almost everywhere till now) but I don't have control over the XML though (at least until Monday till my other pals get back). I am trying to see the feasibility of writing a mashup including this information. Is there someway to disa开发者_开发百科ble checking or something?
Here's the XML that was parsed:
<?xml version="1.0" encoding="UTF-8"?>
<span class="body">
<span class="content">TextContent</span>
<span class="meta entry">TextContent</span>
</span>
I guess the document is not getting parsed correctly.
XPathVisualizer is a nice XPath Visualizer tool, runs on Windows, lets you see the results of your XPath queries. Xcopy install, a single EXE file. Free.
I took it and ran your query in it, got this result:
@Jherico,@Andrew Keith
I don't know the COBRA HTMLParser, but combining #PCDATA with inner nodes is a legal XML format.
This could be defined like this in the DTD:
<!ELEMENT text_node (#PCDATA|i|b|u)*>
This is the way a well-formatted HTML is still a legal XML.
I ran the following code
public static void main(String[] args) throws SAXException, IOException, ParserConfigurationException, XPathExpressionException {
Document doc = XmlUtil.parseXmlResource("/temp.xml");
for (Node n : XPathUtil.getNodes(doc, "//span[contains(@class, 'body')]")) {
System.out.println(XPathUtil.getStringValue(doc, "//span[@class='shared-content']/a/@data"));
}
}
And it output 'associate'. I think your XPath is fine. What is happening instead? And can you remove the empty catch blocks so we can see if you're actually getting exceptions?
Note, XmlUtil and XPathUtil are my own personal convenience functions to eliminate most of the XPath and XML boilerplate code.
I just ran your code sample as is (copy paste) and got this output. So everything seems fine. (which cobra version are you using? Me 0.98.4)
1
Content:
DATA:
DATA:
Information by
Associate
INFO: associate
Reproducible test(?)
- Using javac/java version 1.6.0_16 (HotSpot Client: build 14.2-b01, mixed mode, sharing)
- I downloaded 0.98.4 (
cobra-0.98.4.zip
) from here Sourceforge: Cobra HTML Toolkit download - Extracted
js.jar
andcobra.jar
from thecobra-0.98.4.zip:\lib
to a directoryXXX
- Wrote
XMLTest.java
andHTMLTest.java
in same directory (!filenames are links to source) - Ran this to compile (windows):
javac -cp .;cobra.jar;js.jar *.java
- Then executed like this (output included)
XMLTest
java -cp .;cobra.jar;js.jar XMLTest 1
XMLTest Output:
1
Content:
DATA:
DATA:
Information by
Associate
INFO: associate
HTMLTest
java -cp .;cobra.jar;js.jar HTMLTest 1
HTMLTest Output:
1
Content:
DATA:
DATA:
Information by
Associate
INFO: associate
精彩评论