Extracting string with regex stored in hash
I'm trying to parse out specific values from a text file, and output those to a different file.
I'm using regular expressions stored in a hash (matched up with their descriptive name) to search through a string (scalar), and then storing the discovered values in an array, which is then written out to 开发者_如何学JAVAa file.
I've got everything working, except for the searching/extracting part. (I've only just learned Perl in the past couple days, so I wouldn't be surprised if I was making some really simple mistakes.)
$inputstring = 'Lorem ipsum dolor Date: 20110131 quis semper egestas.';
%myregexhash = ( Date => '/([12][09][0-9][0-9][0-1][0-2][0-9][0-9])/' );
@foundvaluesarray=();
while ( ($thefieldname, $theregex) = each (%myregexhash))
{
if ($inputstring =~ $theregex)
{
push(@foundvaluesarray, "$thefieldname: $&\n");
$inputstring = $';
}
}
print "@foundvaluesarray";
The array fills up with the field names ("Date:"), but not the values I'm looking for ("20110131").
Any idea what I'm doing wrong?
Make one small change:
%myregexhash = ( Date => qr/([12][09][0-9][0-9][0-1][0-2][0-9][0-9])/ );
Note the use of qr//
, which compiles a regex.
You're new, so I'd recommend a few other changes.
Any non-trivial program should begin with the following front matter:
#! /usr/bin/env perl
use strict;
use warnings;
The strict
pragma has nice benefits such as catching misspelled variable names at compile time and checking your use of references. The warnings
pragma turns on extra warning diagnostics that can alert you to questionable cases in your code.
Now must predeclare:
my $inputstring = 'Lorem ipsum dolor Date: 20110131 quis semper egestas.';
my %myregexhash = ( Date => qr/([12][09][0-9][0-9][0-1][0-2][0-9][0-9])/ );
my @foundvaluesarray=();
The = ()
is implied in an array or hash declaration, so you don't see it in idiomatic Perl.
You don't want to use $&
if you can help it because it slows down your entire program.
WARNING: Once Perl sees that you need one of
$&
,$`
, or$'
anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce$1
,$2
, etc., so you also pay a price for each pattern that contains capturing parentheses. (To avoid this cost while retaining the grouping behaviour, use the extended regular expression(?: ... )
instead.) But if you never use$&
,$`
or$'
, then patterns without capturing parentheses will not be penalized. So avoid$&
,$'
, and$`
if you can, but if you can't (and some algorithms really appreciate them), once you've used them once, use them at will, because you've already paid the price. As of 5.005,$&
is not so costly as the other two.
Because you surrounded your pattern with parentheses, the substring that matched is captured in $1
, so grab it from there.
Also, the way you chopped off the front of $inputstring
is much more naturally expressed in Perl with s///
.
while (my ($thefieldname, $theregex) = each (%myregexhash))
{
if ($inputstring =~ s/$theregex//)
{
push(@foundvaluesarray, "$thefieldname: $1\n");
}
}
print "@foundvaluesarray";
Output:
Date: 20110131
精彩评论