Rails - strip_tags - Not catching DOCTYPE?

2023-02-19 07:01 问答作者：

Given an HTML email, I'm using the following to strip down to just the text:

  body = body.gsub(/\\r\\n?/, "\n");
  body = body.gsub(/\\n\\n?/, "\n");
  body = simple_format(body)
  body = strip_tags(body)

But I'm now seeing that one tag gets passed this:

<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">

Which outputs like so:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.开发者_如何学Python01 Transitional//EN">

Any ideas why?

I guess for strip_tags, which looks like it's been deprecated, considers the doctype statement neither a tag, nor a comment. You could strip it out separately:

string.gsub(/<!.*?$/,'')

I ended up using Hpricot to text, worked great

I'd recommend using Nokogiri for your parsing needs. It's very well supported, plenty fast, very flexible, and the basis of a lot of other HTML/XML type gems. It has a Hpricot mode, though I'm not sure why anyone would need that as its syntax is more full-featured.

In particular, to strip tags from HTML, I'd recommend looking into Loofah. It can whitelist tags, and has several layers of cleansing it can do.

继续阅读：ruby ruby-on-rails ruby-on-rails-3 sanitize strip-tags

Rails - strip_tags - Not catching DOCTYPE?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？