开发者

Programmatically remove images and videos from html

I'm working on Ruby on Rails 2.3.8 and I've got a website in which users type posts. Each of them has a s开发者_开发问答hort description that is shown in the main page. That description is automatically built from the original, but it's just truncated so it reaches a max of 240 characters.

The problem is those descriptions may contain images or videos, and I don't want them to appear when I truncate those strings. I'm using Hpricot plugin to parse HTML, and the following regular expression to parse images:

body = Hpricot.parse(html_body)
body = body.to_s.gsub(/<img .*?>/, '')

This is removing images, but sometimes it leaves a string instead, for example it says "image" or "img" where the image was before. Now, for example, I see a loose "spam" text remaining after I deleted an image from the description. Maybe the regex is not correct.

Does anybody know which is the right regex for removing images, and also videos from html?


It seemn go me that you are searching for img with a space after it.

Don't you want this so that you can grab the <img and everything up to but not including the > and then grab the >?

Hard to say if it works without source input.

<img([^>])+

CAUTION: will NOT work with nested tags.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜