开发者

Regular expression matching words between N and M characters and containing a fixed substring

I need regex for words that meet both conditions:

  1. contain a substring (eg foo): \b\w*foo\w*\b
  2. fixed number of characters \b\w{N,M}\b

How to unify the two conditions?

If N and M are smalls is possible using OR.

N = 4 and M = 5

(\bfoo\w{1,2}\b)|(\b\wfoo\w{0,1}\b)|(\b\w\wfoo\b)

But this m开发者_C百科ethod is horrible for eg. N = 4, M = 20


To "and" multiple patterns, you can use zero-width lookaheads. I don't know if these are supported in C#. In Perl, it would look like:

/
    \b
    (?= \w{N,M} \b )
    (?= \w* foo \w* \b )
/x

or

/
    \b
    (?= \w{N,M} \b )
    \w* foo \w* \b
/x

or

/
    \b
    (?= \w{N,M} \b )
    \w* foo
/x

It's usually better not to jam everything into one pattern, though. I would write

my @words = /\b\w{N,M}\b/g;  # Find what we define to be words.
grep /foo/, @words           # Check if any of them are acceptable to us.

(Sorry, that's Perl again, but I don't know C#. Just trying to give ideas.)


I think the wisest in this case is to not to join both regexes. Just do two regex searches, or first find words that meet one of the regexes, and then search the other regex for each word you find. At first sight, it doesn't seem easy to specify how many elements would be before and after foo with the {} syntax.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜