开发者

word boundary regex problem (overlap)

Given the following code:

var myList = new List<string> { "red shirt", "blue", "green", "red" };
Regex r = new Regex("\\b(" + string.Join("|", myList.ToArray()) + ")\\b");
MatchCollection m = r.Matches("Alfred has a red shirt and blue tie");

I want the result of m to in开发者_JS百科clude "red shirt", "blue", "red" since all those are in the string but I am only getting "red shirt", "blue". What can I do to include overlaps?


It seems to me that the regexp parser is removing the match string as soon as the first valid match is found. I don't have a windows compiler setup right now so I can't give an apples to apples comparison but I see similar results in perl.

I think your regex would look something like this after being joined.

'\b(red shirt|blue|green|red)\b'

Testing this regexp out I see the same result as "red shirt", "blue". By moving "red shirt" to the end of the regexp list.

'\b(red|blue|green|red shirt)\b'

I now see "red" , "blue".

By altering the regexp to a little bit of a more complicated approach you might be able to achieve the results you want.

\b(blue|green|(red) shirt)\b

This should match red as its own subgroup and red shirt as a group as well.

Returns "red shirt", "red", "blue"

The simpler way to do it would be to loop through your List of strings and match 1 at a time if you are going to have many word groups that will need multiple matches like red and red shirt.

Since there are so many ways to do regexp, I am probably missing an obvious and elegant solution.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜