开发者

How can I extract html escape chars/entities as text when scraping web? (ruby & nokogiri)

In my ruby+mechanize(nokogiri) script I use this piece of code:

row.at_xpath('td[3]/div[1]/a/text()').to_s.strip

on a forum where the post title html looks like:

<a href="showthread.php?t=233891" >&lt;/body&gt; on Footer ?</a>

and I recei开发者_开发技巧ve from xpath this string &lt;/body&gt; on Footer ?

I would like to get what I can see in the web browser </body> on Footer ?

How can I do that for all html escape characters/entities?


Please take a look this post, to unescape htmlentities

or

There is a ruby package called htmlentities

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜