开发者

regex replace - but with a few exceptions

I have a string containing HTML and I need to replace some words to be links - I do this with the following 开发者_如何学编程code;

string lNewHTML = Regex.Replace(lOldHTML, "(\bword1\b|\bword2|word3\b)", "<a href=\"page.aspx#$1\">$1</a>", RegexOptions.IgnoreCase);

The code works, but I need to include some exceptions to the replace - e.g. I will not replace anything i an img-, li- and a-tag (including link-text and attributes like href and title) but still allow replacements in p-, td- and div-tags.

Can anyone figure this one out?


Ok, after some time of trying to construct a fitting regex, here my try.. This might need additional work, but should point you in the right direction.

I am matching the words "word1" and "word2", not inside a "tag1" or "tag2" tag. You need to adjust this to your needs, of course. Enable RegexOptions.IgnorePatternWhitespace, if you'd like to keep my formatting.

Unfortunatly, I have come up with a regex you could simply plug into Regex.Replace, since this Regex will match the whole String since the match before, but the word you are concerned with is in the first group. This group contains index and length of the word, so you can easily replace it using String.Substring...

(?:
    \G
    (?:
        (?>
             <tag1(?<N>)
            |<tag2(?<N>)
            |</tag1(?<-N>)
            |</tag2(?<-N>)
            |.)*?
        (?(N)(?!))
    )*
 )
(word1|word2)


You need to use the Replace overload with the MatchEvaluator parameter so that you examine each match and decide whether to replace or not.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜