Perl regular expression removing duplicate consecutive substrings in a string
I tried to do a search on this particular problem, but all I get is either removal of duplicate lines or removal of repeated strings where they are separated by a delimiter.
My problem is slightly different. I have a string such as
"comp name1 comp name2 comp name2 comp name3"
where I want to remove the repeated comp name2 and return only
"comp name1 comp name2 comp name3"
They are not consecutive duplicate words, but consecutive duplicate subs开发者_运维问答trings. Is there a way to solve this using regular expressions?
s/(.*)\1/$1/g
Be warned that the running time of this regular expression is quadratic in the length of the string.
This works for me (MacOS X 10.6.7, Perl 5.13.4):
use strict;
use warnings;
my $input = "comp name1 comp name2 comp name2 comp name3" ;
my $output = "comp name1 comp name2 comp name3" ;
my $result = $input;
$result =~ s/(.*)\1/$1/g;
print "In: <<$input>>\n";
print "Want: <<$output>>\n";
print "Got: <<$result>>\n";
The key point is the '\1' in the matching.
To avoid removing duplicate characters within the terms (e.g. comm1 -> com1) bracket .* in regular expression with \b.
s/(\b.*\b)\1/$1/g
I never work with languages that support this but since you are using Perl ...
Go here .. and see this section....
Useful Example: Checking for Doubled Words
When editing text, doubled words such as "the the" easily creep in. Using the regex \b(\w+)\s+\1\b in your text editor, you can easily find them. To delete the second word, simply type in \1 as the replacement text and click the Replace button.
If you need something running in linear time, you could split
the string and iterate through the list:
#!/usr/bin/perl
use strict;
use warnings;
my $str = "comp name1 comp name2 comp name2 comp name3";
my @elems = split("\\s", $str);
my $prevComp;
my $prevFlag = -1;
foreach my $elemIdx (0..(scalar @elems - 1)) {
if ($elemIdx % 2 == 1) {
if (defined $prevComp) {
if ($prevComp ne $elems[$elemIdx]) {
print " $elems[$elemIdx]";
$prevFlag = 0;
}
else {
$prevFlag = 1;
}
}
else {
print " $elems[$elemIdx]";
}
$prevComp = $elems[$elemIdx];
}
elsif ($prevFlag == -1) {
print "$elems[$elemIdx]";
$prevFlag = 0;
}
elsif ($prevFlag == 0) {
print " $elems[$elemIdx]";
}
}
print "\n";
Dirty, perhaps, but should run faster.
精彩评论