Perl regex which grabs ALL double-letter occurrences in a line

2023-02-02 21:53 问答作者：

Still plugging away at teaching myself Perl. I'm trying to write some code that will count the lines of a file that contain double letters and then place parentheses around those double letters.

Now what I've come up with will find the first occurrence of double letters, but not any other ones. For instance, if the line is:

Amp, James Watt, Bob Transformer, etc. These pioneers conducted many

My code will render this:

19 Amp, James Wa(tt), Bob Transformer, etc. These pioneers conducted many

The "19" is the count (of lines containing double letters) and it gets the "tt" of "Watt" but misses the "开发者_如何学Goee" in "pioneers".

Below is my code:

$file = '/path/to/file/electricity.txt';        
open(FH, $file) || die "Cannot open the file\n";        

my $counter=0;

while (<FH>) {
    chomp();
    if (/(\w)\1/) {
        $counter += 1;
        s/$&/\($&\)/g;
        print "\n\n$counter $_\n\n";
    } else {
        print "$_\n";
    }
}

close(FH);

What am I overlooking?

use strict;
use warnings;
use 5.010;
use autodie;

my $file = '/path/to/file/electricity.txt';        
open my $fh, '<', $file;        

my $counter = 0;

while (<$fh>) {
    chomp;
    if (/(\w)\1/) {
        $counter++;
        s/
          (?<full>
               (?<letter>\p{L})
               \g{letter}
          )
        /($+{full})/xg;
        $_ = $counter . ' ' . $_;
    }
    say;
}

You are overlooking a few things. strict and warnings; 5.010 (or higher!) for say; autodie so you don't have to keep typing those 'or die'; Lexical filehandles and the three-argument form of open; A bit nitpicky, but knowing when (not) to use parens for function calls; Understanding why you shouldn't use $&; The autoincrement operator..

But on the regex part specifically, ~~$& is only set on matches (m//), not substitution~~ Actually no, ysth is right as usual. Sorry!

(I took the liberty of modifying your regex a bit; it makes use of named captures - (?) instead of bare parens, accessed through \g{} notation inside the regex, and the %+ hash outside of it - and Unicode-style properties - \p{Etc}). A lot more about those in perlre and perluniprops, respectively.

You need to use a back reference:

#! /usr/bin/env perl

use warnings;
use strict;

my $line = "this is a doubble letter test of my scrippt";

$line =~ s/([[:alpha:]])(\1)/($1$2)/g;

print "$line\n";

And now the test.

$ ./test.pl
this is a dou(bb)le le(tt)er test of my scri(pp)t

It works!

When you do a substitution, you use the $1 to represent what is in the parentheses. When you are referring to a part of the regular expression itself, you use the \1 form.

The [[:alpha:]] is a special POSIX class. You can find out more information by typing in

$ perldoc perlre

at the command line.

You're overcomplicating things by messing around with $&. s///g returns the number of substitutions performed when used in scalar context, so you can do it all in one shot without needing to count matches by hand or track the position of each match:

#!/usr/bin/env perl

use strict;
use warnings;

my $text = 'James Watt, a pioneer of wattage engineering';

my $doubles = $text =~ s/(\w)\1/($1$1)/g;

print "$doubles $text\n";

Output:

4 James Wa(tt), a pion(ee)r of wa(tt)age engin(ee)ring

Edit: OP stated in comments that the exercise in question says not to use =~, so here's a non-regex-based solution, since all regex matches use =~ (implicitly or explicitly):

#!/usr/bin/env perl

use strict;
use warnings;

my $text = 'James Watt, a pioneer of wattage engineering';

my $doubles = 0;
for my $i (reverse 1 .. length $text) {
    if (substr($text, $i, 1) eq substr($text, $i - 1, 1)) {
        $doubles++;
        substr($text, $i - 1, 2) = '(' . substr($text, $i - 1, 2) . ')';
    }
}

print "$doubles $text\n";

The problem is that you're using $& in the second regex which only matched the first occurance of a double letter set

 if (/(\w)\1/) { #first occurance matched, so the pattern in the replace regex will only be that particular set of double letters

Try doing something like this: s/(\w)\1/$$1$1$/g; instead of s/$&/$$&$/g; Full code after editing:

$file = '/path/to/file/electricity.txt';        
open(FH, $file) || die "Cannot open the file\n";        

my $counter=0;

while (<FH>) {
    chomp();
    if (s/(\w)\1/\($1$1\)/g) {
        $counter++;
        print "\n\n$counter $_\n\n";
    } else {
        print "$_\n";
    }
}

close(FH);

notice that you can use the s///g replace in a conditional statement which is true when a replace occurred.

继续阅读：perl regex

Perl regex which grabs ALL double-letter occurrences in a line

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？