开发者

How do I filter CDATA out and only get the text from HTML?

I want to parse a HTML file using Nokogiri. I am able to do that but I o开发者_如何学Cnly want text and no CDATA or JavaScript, since my script and div tags are all over the file.


You can delete all script elements,

doc.search('script').remove

… and then select all text elements

doc.xpath('//text()') 

… or just select the text elements within div elements

doc.xpath('//div//text()') 
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜