How can I correctly calculate the lengths of fields in a CSV dcoument using Perl?
I have a datas et and like to do a simple while
operation with a Perl script.
Here is a small extraction from the dataset:
"number","code","country","gamma","X1","X2","X3","X4","X5","X6" 1,"DZA","Algeria","0.01",7.44,47.3,0.46,0,0,0.13 2,"AGO","Angola","0.00",6.79,"NULL",0.21,1,0,0.28 3,"BEN","Benin","-0.01",7.02,38.9,0.27,1,0,0.05 4,"BWA","Botswana","0.06",6.28,45.7,0.42,1,0,0.07 5,"HVO","Burkina Faso","0.00"开发者_如何转开发,6.15,36.3,0.08,1,0,0.05 6,"BDI","Burundi","0.00",6.38,41.8,0.18,1,0,0
The script should count the length of every ,
separated field and store the highest values
into an array.
However, the saving doesn't work properly. Here is a part of the code:
@maxl = map length, @terms;
while(`<INFILE>`) {
$_ =~ s/[\"\n]//g ;
@terms = split/$sep/, $_;
@lengths = map length, @terms;
for($k = 0, $k <= $#terms, $k++) {
if($lengths[$k] > $maxl[$k]) {
$maxl[$k] = $lenghts[$k];
}
}
print "@lengths\n";
}
Now the @maxl
uses an earlier part from the code where it uses the second line of the dataset.
When I use a print
command just to see the values of the @maxl
operation i get:
1 3 7 4 4 4 4 1 1 5
In the while
loop I used another print
statement just to see the other values, I get:
1 3 6 4 4 4 4 1 1 4
1 3 5 5 4 4 4 1 1 4
1 3 8 4 4 4 4 1 1 4
1 3 12 4 4 4 4 1 1 4
1 3 7 4 4 4 4 1 1 1
1 3 8 4 4 4 4 1 1 4
1 3 10 4 4 4 4 1 1 4
1 3 16 5 4 4 4 1 1 4
2 3 4 5 3 4 4 1 1 4
2 3 7 4 4 4 4 1 1 4
2 3 5 4 4 4 4 1 1 4
2 3 5 4 4 4 4 1 1 4
2 3 8 4 4 4 4 1 1 4
2 3 5 4 4 4 1 1 1 4
The fourth column eg has obviously values which are greater than 3. The while
loop was supposed to save the greatest values and substitute those values into @maxl
.
What went wrong?
...in the for
loop the comma are wrong
for($k = 0, $k <= $#terms, $k++)
however, after cleaning that up there still seems to be a problem...
there's a typo here
$maxl[$k] = $lenghts[$k];
for starters (which 'use strict' would have caught)
consider using Text::CSV for more reliable parsing of comma-separated data (it can also handle other separators):
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new();
my @max_lengths;
while ( my $line = <INFILE> ) {
die "Unable to parse '$line'" unless $csv->parse($line);
my @column_lengths = map { length } $csv->fields();
for my $i ( 0 .. $#column_lengths ) {
if ( $column_lengths[$i] > ($max_lengths[$i] || 0) ) {
$max_lengths[$i] = $column_lengths[$i];
}
}
}
print "MAX LENGTHS OF EACH FIELD: @max_lengths\n";
精彩评论