HTMLUnit collecting all links by class name
I would like to scrape / collect all the links on a page under a specific class name
e.g. HTML Agriculture (92)
<a href="http://www.specificurl/page.html" class="generate">Agriculture</a>
I have been toying with the following pieces of code:
List<?> links = page.getByXPath("//div[@class='generate']/@hre开发者_运维技巧f");
OR
List<?> links = page.getAnchors();
System.out.println(links);
The getByXPath option returns null and the other option grabs all anchors. Is there a way to grab the links into a list?
This is a terrible XPath but I was having issues narrowing it down. (I can look into a better XPath if necessary, but for now this one worked:
List<?> links = page.getByXPath("/html/body/div[2]/div[2]/table/tbody/tr/td/table/tbody/tr[7]/td/table/tbody/tr/td/div/table/tbody/tr[2]/td/div/table/tbody/tr/td/table/tbody/tr/td/ul/li/a/@href").asList()
I'm not quite sure why it wasn't allow us to grab it by that class name.
Let me know how it works for you when you get the chance
精彩评论