How can I extract html escape chars/entities as text when scraping web? (ruby & nokogiri)
In my ruby+mechanize(nokogiri) script I use this piece of code:
row.at_xpath('td[3]/div[1]/a/text()').to_s.strip
on a forum where the post title html looks like:
<a href="showthread.php?t=233891" ></body> on Footer ?</a>
and I recei开发者_开发技巧ve from xpath this string </body> on Footer ?
I would like to get what I can see in the web browser </body> on Footer ?
How can I do that for all html escape characters/entities?
Please take a look this post, to unescape htmlentities
or
There is a ruby package called htmlentities
精彩评论