开发者

In a regular expression, is (\W|$) a workable alternative for \b?

We have a client application where the users want to search "notes" fields for specified text. The fields are either formatted with HTML or plaintext. One of the recent changes that we made was to only support "whole word" matches. Using \b, we made this happen. Pattern:

"\b(?:match)\b" <-- works

New day, new problem: one of the values that they want to find is a number followed by a percent sign. (%) However, the pattern doesn't match. After some research, I was able to determine that, for a character at position n to be considered an end-of-word boundary, \b asserts that the character at position n - 1 must be a word character. However, % is not a word character, so the match fails.

"\b(?:7.0%)\b" <-- fails

I changed this to match \W, and it works, but this has the drawback that there must always be another character following the matched pattern.

"\b(?:7.0%)\W" <-- works, mostly

So what I want to know is, can I use the following as a pattern and have it match end-of-string matches?

"\b(?:7.0%)(\W|$)" <-- ??

I tested it and it appears to work, but is there anything that is going to bite me down the road?

Edit:

Here's a quick test harness that demonstrates the different behaviors, including the answer from agent-j:

List<string> testInputs = new List<string>();

testInputs.Add("This string contains 7.0% embedded within it.");
testInputs.Add("In this string, 7.0%\nis at the end of a line.");
testInputs.Add("7.0% starts this string.");
testInputs.Add("This string ends with 7.0%");

List<string> testPatterns = new List<string>();
testPatterns.Add(@"\b(?:7.0%)\b");
testPatterns.Add(@"\b(?:7.0%)\W");
testPatterns.Add(@"\b(?:7.0%)(\W|$)");
testPatterns.Add(@"\b(?:7.0%)(?!\w)");

foreach (var patt in testPatterns)
{
    Console.WriteLine(string.Format("Testing pattern '{0}'", patt));

    foreach (var input in testInputs)
    {
        Console.WriteLine(string.Format("Input '{0}'; result: {1}", input, Regex.IsMatch(input, patt)));
    }

    Console.WriteLine();
}

Output:

Testing pattern '\b(?:7.0%)\b'
Input 'This string contains 7.0% embedded within it.'; result: False
Input 'In this string, 7.0%
is at the end of a line.'; result: False
Input '7.0% starts this string.'; result: False
Input 'This string ends with 7.0%'; result: False

Testing pattern '\b(?:7.0%)\W'
Input 'This string contains 7.0% embedded within it.'; result: True
Input 'In this string, 7.0%
is at the end of a line.'; result: True
Input '7.0% starts this string.'; result: True
Input 'This string ends with 7.0%'; result: False

Testing pattern '\b(?:7.0%)(\W|$)'
Input 'This string contains 7.0% embedded within it.'; result: True
Input 'In this string, 7.0%
is at the end of a line.'; result: True
Input '7.0% starts this string.'; result: True
Input 'This string ends with 7.0%'; result: True

Testing pattern '\b(?:7.0%)(?!\w)'
Input 'T开发者_如何学Pythonhis string contains 7.0% embedded within it.'; result: True
Input 'In this string, 7.0%
is at the end of a line.'; result: True
Input '7.0% starts this string.'; result: True
Input 'This string ends with 7.0%'; result: True


You are one the right track. Your expression \b(?:7.0%)(\W|$) will match the character following 7.0% when there is a character. Instead, consider using a negative lookahead (?!\w), so that extra character is not a part of your match.

\b(?:7.0%)(?!\w)

If the string ends with 7.0%, it will match, and if the string ends with 7.0%. it will match 7.0%. It will match whether or not your regex options say singleline or multiline.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜