Extract snippet out of HTML with Ruby?
I need to show the first 100 characters of an HTML text, which means, I have to pick the first 100 characters that are not tags and then close any open tags leaving a balanced HTML. Is there any l开发者_运维知识库ibrary that can do it? Or is there any trivial way to do it that I am missing?
The text is originally written in Textile which can and does contain HTML, so I figured I am better off turning it to fully HTML first and then processing, but if something can do it at the Textile level, I'm happy too.
This is how I would get the first 100 chars of text. You may need to modify according to your needs
require 'nokogiri'
def get_first_100_chars
doc = Nokogiri::Slop(open 'html_file.html')
text = doc.html.body.text
return text[0..99]
end
Not sure about balancing the html. Will post if I find out.
Have a look at Nokogiri
精彩评论