开发者

regular expression to find the longest recurring character sequence in a line

How do I write a regular expression to find the longest 开发者_Python百科recurring character sequence in a line?


You can find all relevant character sequences with the regular expression /(.)\1*/.

Finding the longest such sequence is best done using a tool other than regular expressions.


It’s easiest to do this in a loop:

#!/usr/bin/perl
my $string = "this aaa and bbbb for ### ## ppppppp";
my $max = "";
while ($string =~ /((.)\2+)/gs) {
    $max = $1 if length($1) > length($max);
}
print "$max\n";

You could also use a reduce, but this is less efficient:

#!/usr/bin/perl
use List::Util "reduce";
my $string = "this aaa and bbbb for ### ## ppppppp";
my $max = reduce { length($b) > length($a) ? $b : $a } "", 
                    $string =~ /((.)\2+)/gs;
print "$max\n";

If you want it in just one assignment, that’s simply:

#!/usr/bin/perl
my $string = "this aaa and bbbb for ### ## ppppppp";
my $max = ( sort { length($b) <=> length($a) } "", $string =~ /((.)\2+)/g)[0];
print "$max\n";

All three answers produce ppppppp for that sample string.

They also return the empty string if there is no such sequence, and they return first such sequence in the event of a tie.


You can do the following regular expression to find repeating characters:

(.)\1+

but should use your programming language to properly determine the longest match.


You don't, it's impossible to put state such as "longest" into a regular expression. The only thing you can do is make a regular expression, and have it match against the sequence. If it matches, get the length of recurring characters and make a longer regular expression that matches more characters. Keep doing this while you find matches.
This is a silly alternative of just writing a simple parser for the text.

In pseudo-code that parser could be:

for(i = beginning to end, i++) {
 recurring_length = recurring(i, 1);
 if(recurring_length > max)
     max = recurring_length;
}

function recurring(i, length) {
   if(i+1 != EOF && (character at i == character i+1) )
       return recurring(i+1, length + 1);
   else return length;
}


Here's how it's done in Python (no need for regular expressions):

>>> str = 'iamastriiiiiingwaitwaaaaaaaaaaaaaatttt'
>>> lchar = ''
>>> longest = 0
>>> cnt = 1
>>> for i in str:
    if lchar == i:
        cnt += 1
    else:
        cnt = 1
    if cnt > longest:
        longest = cnt
        longchar = i
    lchar = i

>>> longchar
'a'
>>> longest
14

And if you want to store it in a string (pretty simple):

>>> string = ''
>>> for x in range(longest):
    string += longchar

>>> string
'aaaaaaaaaaaaaa'


You can try this :

#!/usr/bin/perl
use 5.10.1;
use strict;
use warnings;
use Data::Dumper;

my $str = 'ahhhhhhhhhhjjjjjjjiiiieeeeeeeeeeeeeeei';
my ($char, $long) = ('',0);
while($str=~/(.)\1*/g) {
    if (length $& > $long) {
        $long = length$&;
        $char = $1,
    }
}
say "$char : $long";

Output:

e : 15
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜