开发者

Using regex to find any last occurrence of a word between two delimiters

Suppose I have the following test string:

Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get_Stop

where _ means any characters, eg: StartaGetbbGetcccGetddddStopeeeeeStart....

What I want to extract is any last occurrence of the Get word within Start and Stop delimiters. The result here would be the three bolded Get below.

Start__Get__Get__Get__Stop__Start__Get__Get__Stop__Start__Get__Stop

I precise that I'd like to do this only using regex and as far as possible in a single pass.

Any suggestio开发者_如何学运维ns are welcome

Thanks'


Get(?=(?:(?!Get|Start|Stop).)*Stop)

I'm assuming your Start and Stop delimiters will always be properly balanced and they can't be nested.


I would have done it with two passes. The first pass find the word "Get", and the second pass count the number of occurrences of it.


$ echo "Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get__Stop" | awk -vRS="Stop" -F"_*" '{print $(NF-1)}'
Get
Get
Get


Something like this, maybe:

(?<=Start(?:.Get)*)Get(?=.Stop)

That requires variable-length lookbehind support, which not all regex engines support.
It could be made to have a max length, which a few more (but still not all) support, by changing the first * to {0,99} or similar.

Also, in the lookahead, possibly the . should be a .+ or .{1,2} depending on if the double underscore is a typo or not.


With Perl, i'd do :

my $test = "Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get_Stop";
$test =~ s#(?<=Start_)((Get_)*)(Get)(?=_Stop)#$1<FOUND>$3</FOUND>#g;
print $test;

output:

Start_Get_Get_<FOUND>Get</FOUND>_Stop_Start_Get_<FOUND>Get</FOUND>_Stop_Start_<FOUND>Get</FOUND>_Stop

You should adapt to your regex flavour.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜