Stuck with Regular Expression code to apply HTML tag to text but exclude if inside <?> tag [duplicate]

2022-12-27 07:26 问答作者：

This question already has answers here: Closed 10 years ago.

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags

I'm trying to write a bit of regex which would go through some text, written by our Editors, and apply an <acronym> tag to the first instance it finds of an abbreviation set we hold in our "Glossary of Terms".

So for this example I've used the abbreviation ITS.

1st thing I thought I'd do is setup an example with a mix of scenerios I could test against, i.e. ITS sitting with punctuation, in HTML tags & ones that we've applied this to already (in other words the script has run through this before, so no need to do again).

I'm almost there but just got stuck at the last point :-(.

Here's the regex I've got so far - <[^<|]+?>?>ITS<[^<]+?>|ITS

The Example - FROM ( EVERY ITS IN BOLD TO BE WRAPPED WITH ACRONYM ):

I want you to tag this ITS, but not this wrapped one - <acronym title="ITS" id="thisIsATest">ITS</acronym>

This is another test as I still want to update <p>ITS</p> that have other HTML tags wrapped around them.`

ITS want ones that start sentences and ones that finish ITS. ITS, and ones which are wrapped in punctuation.`

Test link: <a href="index.cfm>ITS</a>

AND I WANT THIS CHANGE TO :

I want you to tag this <acronym title="ITS">ITS</acronym>, but not this wrapped one - <acronym title="ITS">ITS</acronym>

This is another test as I still want to update <acronym title="ITS">ITS</acronym> that have other HTML tags wrapped around them.`

<acronym title="ITS">ITS</acronym> want ones that start sentences and ones that finish <acronym title="ITS">ITS</acronym>. <acronym title="ITS">ITS</acronym>, and ones which are wrapped in punctuation.

Test link: <acronym title="ITS"><a href="index.cfm>ITS</a></acronym>

Are there any Reg Ex experts out there that could help me finish this off? Any other hints tips would also be appreciated.

** UPDATE ** Don't know if this helps but this would find the only in that paragraph :

<acronym[^<]*ITS</acronym>

and this will find all the ITS :

<[^<]*>ITS<[^<]*>|ITS

What I really need i开发者_Go百科s a way of combining these to say find all the ITSs but exclude those in tags.

Thanks a lot, James

P.S. This is going to be placed in a ColdFusion application if that helps anyone in specific syntax.

Here's the HTML I'm trying to parse:

http://pastebin.com/5k32aG8i

Here is your basic problem: regex is not a parser. This problem has been approached many times, and there is no general purpose solution with only regex. You can fake it to a point by using lookahead, lookbehind, and some really complicated footwork, but you quickly get to the point where your expression is way to complicated to maintain.

I can suggest a couple approaches.

If you are using text that is XML compliant, you can parse the text using xmlparse() and then step through the resulting structure, applying your regex to the xmltext of each node.

Alternately, you can try replacing each tag in the text block with a placeholder, doing a replace on the resulting text, then restoring the placeholders.

Obviously, neither of these is perfect, but either, with some tweaking, may get you where you're going.

~~Does this work?~~

~~(?!(<acronym\W*>|\w))ITS(?!(<acronym\W*>|\w))~~

~~Haven't been tested since I don't have ColdFusion~~

Looks like ColdFusion doesn't support lookbehinds. However, you can still use lookaheads ((?!...)) to ensure that the string (ITS) isn't followed by </acronym>.

\\WITS(?!(</acronym\\W*>|\\w))

Since you can't use lookbehinds, you need \W in the beginning to make sure the string isn't a part of another word. Unfortunately, it will eat up the previous character if matched. The \w at end also makes sure it's not a part of a word.

继续阅读：coldfusion regex

Stuck with Regular Expression code to apply HTML tag to text but exclude if inside <?> tag [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？