Regex lookahead, lookbehind and atomic groups

2023-01-02 01:52 问答作者：

I found these things in my regex body but I haven't got a clue what I can use them for. Does somebody have examples so I can try to understand how they work?

(?!) - negative lookahead
(?=) - positive lookahead
(?<=) - positive lookbehind
(?<!) - negative lookbehind

(?>) 开发者_运维知识库- atomic group

Examples

Given the string foobarbarfoo:

bar(?=bar)     finds the 1st bar ("bar" which has "bar" after it)
bar(?!bar)     finds the 2nd bar ("bar" which does not have "bar" after it)
(?<=foo)bar    finds the 1st bar ("bar" which has "foo" before it)
(?<!foo)bar    finds the 2nd bar ("bar" which does not have "foo" before it)

You can also combine them:

(?<=foo)bar(?=bar)    finds the 1st bar ("bar" with "foo" before it and "bar" after it)

Definitions

Look ahead positive `(?=)`

Find expression A where expression B follows:

A(?=B)

Look ahead negative `(?!)`

Find expression A where expression B does not follow:

A(?!B)

Look behind positive `(?<=)`

Find expression A where expression B precedes:

(?<=B)A

Look behind negative `(?<!)`

Find expression A where expression B does not precede:

(?<!B)A

Atomic groups `(?>)`

An atomic group exits a group and throws away alternative patterns after the first matched pattern inside the group (backtracking is disabled).

(?>foo|foot)s applied to foots will match its 1st alternative foo, then fail as s does not immediately follow, and stop as backtracking is disabled

A non-atomic group will allow backtracking; if subsequent matching ahead fails, it will backtrack and use alternative patterns until a match for the entire expression is found or all possibilities are exhausted.

(foo|foot)s applied to foots will:
1. match its 1st alternative foo, then fail as s does not immediately follow in foots, and backtrack to its 2nd alternative;
2. match its 2nd alternative foot, then succeed as s immediately follows in foots, and stop.

Some resources

http://www.regular-expressions.info/lookaround.html
http://www.rexegg.com/regex-lookarounds.html

Online testers

https://regex101.com

Lookarounds are zero width assertions. They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion. They don't consume any character - the matching for regex following them (if any), will start at the same cursor position.

Read regular-expression.info for more details.

Positive lookahead:

Syntax:

(?=REGEX_1)REGEX_2

Match only if REGEX_1 matches; after matching REGEX_1, the match is discarded and searching for REGEX_2 starts at the same position.

example:

(?=[a-z0-9]{4}$)[a-z]{1,2}[0-9]{2,3}

REGEX_1 is [a-z0-9]{4}$ which matches four alphanumeric chars followed by end of line.
REGEX_2 is [a-z]{1,2}[0-9]{2,3} which matches one or two letters followed by two or three digits.

REGEX_1 makes sure that the length of string is indeed 4, but doesn't consume any characters so that search for REGEX_2 starts at the same location. Now REGEX_2 makes sure that the string matches some other rules. Without look-ahead it would match strings of length three or five.

Negative lookahead

Syntax:

(?!REGEX_1)REGEX_2

Match only if REGEX_1 does not match; after checking REGEX_1, the search for REGEX_2 starts at the same position.

example:

(?!.*\bFWORD\b)\w{10,30}$

The look-ahead part checks for the FWORD in the string and fails if it finds it. If it doesn't find FWORD, the look-ahead succeeds and the following part verifies that the string's length is between 10 and 30 and that it contains only word characters a-zA-Z0-9_

Look-behind is similar to look-ahead: it just looks behind the current cursor position. Some regex flavors like javascript doesn't support look-behind assertions. And most flavors that support it (PHP, Python etc) require that look-behind portion to have a fixed length.

Atomic groups basically discards/forgets the subsequent tokens in the group once a token matches. Check this page for examples of atomic groups

Grokking lookaround rapidly.
How to distinguish lookahead and lookbehind? Take 2 minutes tour with me:

(?=) - positive lookahead
(?<=) - positive lookbehind

Suppose

    A  B  C #in a line

Now, we ask B, Where are you?
B has two solutions to declare it location:

One, B has A ahead and has C bebind
Two, B is ahead(lookahead) of C and behind (lookhehind) A.

As we can see, the behind and ahead are opposite in the two solutions.
Regex is solution Two.

Why - Suppose you are playing wordle, and you've entered "ant". (Yes three-letter word, it's only an example - chill)

The answer comes back as blank, yellow, green, and you have a list of three letter words you wish to use a regex to search for? How would you do it?

To start off with you could start with the presence of the t in the third position:

[a-z]{2}t

We could improve by noting that we don't have an a

[b-z]{2}t

We could further improve by saying that the search had to have an n in it.

(?=.*n)[b-z]{2}t

or to break it down;

(?=.*n) - Look ahead, and check the match has an n in it, it may have zero or more characters before that n

[b-z]{2} - Two letters other than an 'a' in the first two positions;

t - literally a 't' in the third position

I used look behind to find the schema and look ahead negative to find tables missing with(nolock)

expression="(?<=DB\.dbo\.)\w+\s+\w+\s+(?!with\(nolock\))"

matches=re.findall(expression,sql)
for match in matches:
    print(match)

继续阅读：lookaround regex

Regex lookahead, lookbehind and atomic groups

Examples

Definitions

Look ahead positive `(?=)`

Look ahead negative `(?!)`

Look behind positive `(?<=)`

Look behind negative `(?<!)`

Atomic groups `(?>)`

Some resources

Online testers

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Examples

Definitions

Look ahead positive (?=)

Look ahead negative (?!)

Look behind positive (?<=)

Look behind negative (?<!)

Atomic groups (?>)

Some resources

Online testers

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Look ahead positive `(?=)`

Look ahead negative `(?!)`

Look behind positive `(?<=)`

Look behind negative `(?<!)`

Atomic groups `(?>)`

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？