Extract information from a web page using Jsoup
I want to extract the review and rating information from a buy.com page using Jsoup. Problem is I can't seem to figure out how to do so because the id for every review differs according to its number. For example review number 11 looks something like this:
<a id="CustomerReviews_customerReviews_ctl11_reviewIdAnchor" name="a352496"&开发者_开发知识库gt; </a><br />
<span id="CustomerReviews_customerReviews_ctl11_ratingInfo"><span class="blueText"><b>5</b> of <b>5</b></span> <b>Great Product</b> 12/15/2010<br /></span>
<span id="CustomerReviews_customerReviews_ctl11_reviewerInfo"><b>A customer</b> from x<br></span>
<span id="CustomerReviews_customerReviews_ctl11_reviewContent">content</span>
while review number 12 would have the id: ctl12 How can I extract the review content and rating for all reviews in the page?
I'm a bit late but I hope it helps you and the others which may find the same issue!
You should try something like this:
String code1 = "<span id=\"CustomerReviews_customerReviews_ctl11_ratingInfo\"><span class=\"blueText\"><b>1</b> of <b>5</b></span> <b>Great Product</b> 12/15/2010<br /></span>";
String code2 = "<span id=\"CustomerReviews_customerReviews_ctl12_ratingInfo\"><span class=\"blueText\"><b>2</b> of <b>5</b></span> <b>Bad product</b> 12/03/2010<br /></span>";
Document document = Jsoup.parse(code1 + code2);
Elements elements = document.select("span[id~=CustomerReviews_customerReviews_ctl.*_ratingInfo] ");
for (Element element : elements) {
System.out.println(element.outerHtml());
Elements spanBlueText = element.select("span > span > b");
String note = spanBlueText.get(0).text();
String max = spanBlueText.get(1).text();
System.out.println(" - note: " + note + "/" + max);
String comment = element.select("> b").text();
System.out.println(" - comment: " + comment);
String date = element.text();
date = date.substring(date.length() - 10);
System.out.println(" - date: " + date);
}
This example makes heavy use of the Jsoup select
method. You can find the correct syntax for its arguments in the Jsoup Cookbook.
精彩评论