开发者

How to upper case a regular expressions pattern?

I'm currently looking at a project which highly utilises Regular Expressions. The input strings are already upper cased and so the regex IgnoreCase flag has been set. The internal MS RegEx engine though is then changing all the case back to lower which is an unnecessary hit. Changing the reg expresions pattern to upper case and removing the flag helps the performance.

Does anyone know of a library of algorithm which can upper case the Reg ex patterns without affecting the gr开发者_JAVA技巧oup names or escaped chars?


You could go and search for lowercase letters that are not preceded by an uneven number of backslashes:

(?<!(?<!\\)(?:\\\\)*\\)\p{Ll}+

Then pass the match to a MatchEvaluator, uppercase it and replace the text in the original string. I don't know C#, so this might not work right away (code snippet taken and modified a bit from RegexBuddy), but it's a start:

string resultString = null;
resultString = Regex.Replace(subjectString, 
    @"(?<!                 # Negative lookbehind:
       (?<!\\)(?:\\\\)*\\  # Is there no odd number of backslashes
      |                    # nor
       \(\?<?\p{L}*        # (?<tags or (?modifiers
      )                    # before the current position?
      \p{Ll}+              # Then match one or more letters", 
    new MatchEvaluator(ComputeReplacement), RegexOptions.IgnorePatternWhitespace);

public String ComputeReplacement(Match m) {
    // You can vary the replacement text for each match on-the-fly
    return @"\0".ToUpper();  // or whatever is needed for uppercasing in .NET
}

Explanation:

(?<!        # assert that the string before the current position doesn't match:
 (?<!\\)    # assert that we start at the first backslash in the series
 (?:\\\\)*  # match an even number of backslashes
 \\         # match one backslash
)
\p{Ll}+     # now match any sequence of lowercase letters
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜