What is a Perl regex for finding the first non-consecutively-repeating character in a string?

2022-12-24 23:46 问答作者：

Your task, should you choose to accept it, is to write a Perl regular expression that for a given string, will return the first occurrence of a character that is not consecutively duplicated. In other words, both preceded AND succeeded by characters different from itself (or start/end of string respectively).

Example:

IN: aabbcdecc
OUT: c

Please note that "not consecutively duplicated" does not mean "anywhere in the string".

NOTE: it must be a pure regex expr开发者_运维知识库ession. E.g. the solution that obviously comes to mind (clone the string, delete all the duplicates, and print the first remaining character) does not count, although it solves the problem.

The question is inspired by my somewhat off-topic answer to this: How can I find the first non-repeating character in a string using Perl?

(?:(.)\1+)*(.?)

Get the 2nd capture. (Will return an empty string if every character is consecutively duplicated.)

Test cases:

~:2434$ perl -e "\"abc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
a
~:2435$ perl -e "\"aabbcc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"

~:2436$ perl -e "\"aabbc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
c
~:2437$ perl -e "\"aabcc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
b
~:2438$ perl -e "\"aabcbbbcccccc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
b
~:2439$ perl -e "\"aabbvbbcccccc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
v
~:2440$ perl -e "\"aabbcdecc\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
c
~:2441$ perl -e "\"aabbccddeef\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
f
~:2442$ perl -e "\"faabbccddeef\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
f
~:2443$ perl -e "\"faabbccddeefax\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
f
~:2444$ perl -e "\"xfaabbccddeefx\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
x
~:2445$ perl -e "\"xabcdefghai\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
x
~:2446$ perl -e "\"cccdddeeea12345\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
a
~:2447$ perl -e "\"1234a5678a23\" =~ m/(?:(.)\1+)*(.?)/; print \$2;"
1

Or (will not match if every character is consecutively duplicated.)

(?:^|(.)(?!\1))(.)(?!\2)

use 5.010;
$str=~/^(([a-z])\g{-1}+)*(?<c>[a-z])/i;
$char = $+{c};

I wish Perl had a regex negate flag! ie, return all the characters that do NOT match /regex/

What you are looking for is really the regex capture complement of:

m/(.)(\1)+/

I tried all the suggestions on this page against Brian's data list (the result of in his program listing). None work completely.

The regex of:

(?:^|(.)(?!\1))(.)(?!\2)

fails to match the beginning 'f' in line 2 and 3. Brian's does not match the 'f' at the beginning of line 2 and 3 or any of the singletons at the end of line 5.

The regex of:

$str=~/^(([a-z])\g{-1}+)*(?<c>[a-z])/i;
$char = $+{c};

does work.

The only single regex that I found is a simple one:

#!/usr/bin/perl
while( <DATA> ) {
    chomp;
    print "BEFORE: $_\n";
    s/(.)(\1)+//g;
    print "AFTER: $_\n";
    print "charater: " . substr($_,0,1) . "\n\n";
 }
__END__
aabbccddeef
faabbccddeef
faabbccddeefax
xfaabbccddeefx
xabcdefghai
cccdddeeea12345
1234a5678a23
aabbcdecc
abcdefg
aabbccddeef
cccdddeeea12345

This works in the simple case of 'give the first character.' ((edit: reread: sorry, I now read that the obvious delete the doubles was not what you were looking for...))

Love to hear if there is a better solution.

继续阅读：perl regex

What is a Perl regex for finding the first non-consecutively-repeating character in a string?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？