How to parse content with <pre>?
I am using jsoup to parse a number of things.
I am trying to parse this tag
开发者_如何学Python<pre>HEllo Worl<pre>
But just cant get it to work.
How would i parse this using jsoup?\
Document jsDoc = null;
jsDoc = Jsoup.connect(url).get();
Elements titleElements = jsDoc.getElementsByTag("pre");
Here is what i have so far.
Works fine for me with latest Jsoup:
String html = "<p>lorem ipsum</p><pre>Hello World</pre><p>dolor sit amet</p>";
Document document = Jsoup.parse(html);
Elements pres = document.select("pre");
for (Element pre : pres) {
System.out.println(pre.text());
}
Result:
Hello World
If you get nothing, then the HTML which you're parsing simply doesn't contain any <pre>
element. Check it yourself by
System.out.println(document.html());
Perhaps the URL is wrong. Perhaps there's some JavaScript which alters the HTML DOM with new elements (Jsoup doesn't interpret nor execute JS). Perhaps the site expects a real browser instead of a bot (change the user agent then). Perhaps the site requires a login (you'd need to maintain cookies). Who knows. You can figure this all out with a real webbrowser like Firefox or Chrome.
精彩评论