Xpath - How to get the data contained between elements, not the elements themselves
I'm writing a Java program that scrapes a web page for links and then stores them in a database. I'm having problems though. Using HTMLUnit, I wrote the following:
page.getByXPath("//a[starts-with(@href, \"showdetails.aspx\")]");
It returns the correct anchor elements, but I only want the actual path contained in the href attribut开发者_开发技巧e, not the entire thing. How can I do this, and further, how can I get the data contained between nodes:
<a href="">I need this data, too.</a>
Thanks in advance!
The first (getting the href)
page.getByXPath("//a[starts-with(@href, \"showdetails.aspx\")]/@href");
The second (getting the text)
page.getByXPath("//a[starts-with(@href, \"showdetails.aspx\")]/text()");
I assume that getByXPath is a utility function written by you which uses XPath.evaluate? To get the string value you could use either xpath.evaluate(expression, object)
or xpath.evaluate(expression, object, XMLConstants.STRING)
.
Alternatively you could call getNodeValue()
on the attribute node returned by evaluating "//a[starts-with(@href, \"showdetails.aspx\")]/@href".
精彩评论