Elegantly Parsing Rigid Data in Perl
I'm working with a large dataset that basically boils down to something like this:
my $input = q(
<foo>111</foo>
<foo>222</foo>
<foo>333</foo>
<foo></foo>
<foo>555</foo>
); # new-lines are either CR+LF, LF, or CR
Based on the example above, let's assume that the followi开发者_JAVA技巧ng constraints are in effect:
- There will always be 5 lines of data.
- Data in each line is enclosed in a single tag such as
<foo>...</foo>
. - Data will contain no nested tags.
- All lines use the same tag (e.g
foo
) to enclose their data.
Ultimately, taking the above as the data source, I'd like to end up with something akin to this:
my %values = (
one => '111',
two => '222',
three => '333',
four => '',
five => '555'
);
This is my attempt:
my @vals = $input =~ m!<foo>(.*?)</foo>!ig;
if (scalar @vals != 5) {
# panic
}
my %values = (
one => shift @vals,
two => shift @vals,
three => shift @vals,
four => shift @vals,
five => shift @vals
);
This works as I want, however it looks ugly and is not very flexible. Unfortunately, this is the best I can do for now since I'm new to Perl.
So, given the above constraints, what's a more elegant way to do this?
Merging two arrays into a hash:
my @keys = qw/one two three/;
my @values = qw/alpha beta gamma/;
my %hash;
@hash{@keys} = @values;
First, take another look at:
my %values = (
one => '111',
two => '222',
three => '333',
four => '',
five => '555'
);
This data structure associates an integer with a piece of data. But there is already a built in data structure that serves the same purpose: Arrays.
So, use arrays. Instead of writing $values{ one }
, you would write $values[ 0 ]
, and the mapping between integers and data values would be transparent.
If the keys are something other than integers, you can do:
use strict; use warnings;
my @keys = qw(a b c d e);
my $input = q(
<foo>111</foo>
<foo>222</foo>
<foo>333</foo>
<foo></foo>
<foo>555</foo>
); # new-lines are either CR+LF, LF, or CR
my %values;
# hash slice
@values{ @keys } = $input =~ m{ <foo> (.*?) </foo>}gix;
use YAML;
print Dump \%values;
Output:
--- a: 111 b: 222 c: 333 d: '' e: 555
Oh, something like this give or take?
use Number::Spell;
$input =~ s|<(?:/)?foo>||g;
my @lines = grep { $_ } split "\n", $input; # grep for blank lines
my $i = 0;
my %hash = map { spell_number($i++) => $_ } @lines;
Hmm, I can make this better.
use Number::Spell;
my $i = 0;
my %hash = map { s|<(?:/)?foo>||g; $_ ? spell_number($i++) => $_ : () }
split "\n", $input;
ed. whoops, had an @lines instead of $input inna second snippet. use caution; I have only typed out this code; I have not written a unit test.
精彩评论