Close all HTML unclosed IMG tags

2022-12-23 05:09 问答作者：

Is it possible to do a regex replace on all IMG tags that are unclosed? If so, how would I identify:

  <img src="..." alt="...">

...as a potential canidate to be replaced?

   = <img src="..." alt="..."/>

Update: 开发者_JS百科We have hundreds of pages, and thousands of image tags, all must of which must be closed. I'm not stuck on RegEx -- any other method, aside from manually updating all IMG tags, would suffice.

(<img[^>]+)(?<!/)>

will match an img tag that is not properly closed. It requires that the regex flavor you're using supports lookbehind (which Ruby and JavaScript don't but most others do). Backreference no. 1 will contain the match, so if you search for this regex and replace by \1/> you should be good to go.

If you need to account for the possibility of > inside attributes, you could use

(<img("[^"]*"|[^>])+)(?<!/)>

This will match, e.g.,

<img src="image.gif" alt="hey, look--->">
<img src="image/image.gif">

and leave

<img src="image/image.gif" />

alone.

In HTML the end tag for an <img> "must be omitted", so the start tag closes the element and you can't have an unclosed img.

If you want to convert your HTML to XHTML then use a real parser. Regular Expressions aren't a very good tool for this job.

To replace all unclosed IMG tags :

content = "text<img src='img.jpg'>text<img src='img.png' >text"
content = re.sub('(<img.*?)>', r'\1/>', content, count=0)
print(content)

lookbehind is cool though

What exactly do you mean by "unclosed"?

 <img src="a1.jpg    <--no ending quotes and end parens
 <img src="a1.jpg"   <-- no end parens
 <img src="a1.jpg">  <-- the tag does not self-close as should be done in XHTML

You can try to intelligently find such suspects, but you are never guaranteed to be fool-proof.

I have never tried this but a closed img tag is a tag beginning with <img with stuffs in and a /> at the end.

Here is something I tried in perl

!/usr/bin/env perl

my @images = ('<img src="toto.jpg">',
          '<img src="truc/machin.jpg" title="pouet" >',
          '<img        src="pouet.jpg" alt="toto" />',
          '<img src="math/a-greater-than-b.png" alt="a > b">');

foreach (@images) {
    if (/<img\s+(([a-z]+=".*?")+\s*)>/) {
    print "Match : <img $1 />\n";
    }
}

Produces:

Match : <img src="toto.jpg" />
Match : <img src="truc/machin.jpg" title="pouet"  />
Match : <img src="math/a-greater-than-b.png" alt="a > b" />

继续阅读：regex xhtml

Close all HTML unclosed IMG tags

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？