开发者

Regular expressions - check if a match is not contained in a textarea

OK, so I opened up this question yesterday and got an answer fairly quickly. It worked, or so I thought, so I marked it as the correct answer.

However I don't think I explained the situation very well. Basically I am getting the HTML right before it is rendered, parsing it and searching for strings matching the pattern [tag|text x], where x is a number and the two words are case-insensitive.

However, as stated in the previous question, I would like to NOT replace these tags if they're inside a textarea. This means that if they're between </textarea> and <textarea...> then I would still like to replace them, but if they're between <textarea...> and </textarea> then I would NOT like to replace them.

So far I have

@"(?<!\<textarea class='tag'\>)\[(tag|text) ([0-9]+)\]"

I have tried

@"(?<!\<textarea.[^>]*\>)\[(tag|text) ([0-9]+)\]"

but that doesn't appear to work either.

For example I would like to replace any tags outside of the textareas in the following:

[tag 1]
<textarea>[tag 2]</textarea>[tag 3]
<textarea class="bob">Walter [tag 4]</textarea>[tag 5]
<textarea attr-1="fred">Jim [tag 6] Mary</textarea>[tag 7]
[tag 8]

In this example only 开发者_开发百科tags 1, 3, 5, 7 and 8 should be replaced; 2, 4 and 6 should not.

Does anyone have any idea how what I should change it to in order to achieve this? I am not asking for anyone to just do all the work for me and give me the answer - I am in this to learn. I have struggled with this for a few hours now so any assistance with this would be great!


This kind of thing is usually easier to do with lookaheads than lookbehinds. This works as you requested:

@"\[(tag|text)\s+(\d+)\](?![^<]*(?:<(?!/?textarea\b)[^<]*)*</textarea>)"

The idea here is to look for a </textarea> tag, but only if you don't encounter a <textarea...> tag first--that's this part:

[^<]*(?:<(?!/?textarea\b)[^<]*)*</textarea>

Assuming the HTML is well formatted, that regex could only match inside a textarea element. Putting it in a negative lookahead which is executed after the [tag] has been matched causes matches in textareas to be rejected.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜