开发者

Problem of Regex'ing images to BBCode

I'm working on something for phpBB3 of my own, I'm trying to convert those smiles back to the oringinal smiley state, e.g.

:)  :(  :O  :P

Since the HTML of a smiley contains this:

/<img src=".*" alt="(.*)" title=".*">/gi

Replaced to:

$1

However, when I have multiple smileys, it just show the last smiley, e.g. if it was li开发者_运维技巧ke this:

alt text http://uimgz.com/i/R2e3H8g5D8.png

It turns into this:

:twisted:

Which is the last smiley on the right, why hasn't it replaced and returned all of the smiley states which it should return like this:

:) :o :twisted:

The Regex seems fine, but I don't what seems to be the problem, all of the regex go through a replacement loop using a for() loop so that's not the problem.

Multiple smileys HTML:

<img src="./images/smilies/icon_e_smile.gif" alt=":)" title="Smile" /> <img src="./images/smilies/icon_e_surprised.gif" alt=":o" title="Surprised" /> <img src="./images/smilies/icon_twisted.gif" alt=":twisted:" title="Twisted Evil" />


Change the regex to this and try again:

/<img src="[^"]*" alt="([^"]+)" title="[^"]*">/gi

The regex engine is usually greedy. It will try to match the longest text matching something. In your case i it matched all three links as one. What i did here was to limit the content inside the src attribute to not contain " so it will not match all the way to the third src. It treated this as a src attribute ./images/smilies/icon_e_smile.gif" alt=":)" title="Smile" /> <img src="./images/smilies/icon_e_surprised.gif" alt=":o" title="Surprised" /> <img src="./images/smilies/icon_twisted.gif


Use *? and +? for non-greedy matching:

/<img src=".*?" alt="(.+?)" title=".*?">/gi

What's happening in your failing example is that the first .* is matching all of this:

./images/smilies/icon_e_smile.gif" alt=":)" title="Smile" /> <img src="./images/smilies/icon_e_surprised.gif" alt=":o" title="Surprised" /> <img src="./images/smilies/icon_twisted.gif

which is still producing a valid match, but it's not what you want. The ? after */+ makes the regex consume the smallest string necessary to make a successful match. Read the section "Watch Out for The Greediness!" in this article.

I'd like to also add the general warning that regular expressions aren't the best tool for parsing HTML. Even my regex will break if the src attribute has an escaped " for example.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜