开发者

Parse comma delimited file based on a start and end date

I am new to perl so please accept my apologies if my question is trivial. I have a very large file with data that looks like the following:

Date, Time, Data1, Data2, Data3  
1/4/1999,9:31:00 AM,blah, blah, blah  
1/4/1999,9:32:00 AM,blah, blah, blah  
1/4/1999,9:33:00 AM,blah, blah, blah  

I have a file named 'cities.txt' which has a list of cities located on different rows with a comma at the end of the row.

i.e.

Boston,  
Atlanta,  
Seattle,  

Each city has its own file in that same directory that has the following naming convention 'Boston 1 Minute Moisture Data.txt'. I want to first read the 'cities.txt' file and for each city that appears in that file find the associated moisture data file and extract all the data (r开发者_如何学Pythonows) between and including TWO sets of dates (a START and an END date) and SAVE that to another file. The date is located in the first column.

I have read through comments made in the following post but I am still very confused.

How do I efficiently parse a CSV file in Perl?

I wrote a simple script using some examples online. Firstly, I just wanted to see if I was using the module correctly. So all I wanted to do was to get the parser to parse the fields and calculate the sum of a specific column.

#!/usr/bin/perl  
use strict;  
use warnings;  

use Text::CSV_XS;  
my $csv = Text::CSV_XS->new();  

my $file = 'Boston 1 Minute Moisture Data.csv';  

my $sum = 0;  
open(my $data, '<', $file) or die "Could not open '$file'\n";  
while (my $line = <$data>) {  
    chomp $line;  

    if ($csv->parse($line)) {  
        my @columns = $csv->fields();  

        $sum += $columns[3];  
    } else {  
        warn "Line could not be parsed: $line\n";  
    }  
}  
print "$sum\n";  

The result is get is "Line could not be parsed: $line\n". For some reason the parser isn't parsing the fields. Any ideas?

I also tried the following code:

#!/usr/bin/perl  
use strict;  
use warnings;  
use Text::CSV;  

my $file = 'Boston 1 Minute Moisture Data.csv';  
my $csv = Text::CSV->new();  

open (CSV, "<", $file) or die $!;  

while (<CSV>) {  
    if ($csv->parse($_)) {  
        my @columns = $csv->fields();  
        #print "@columns\n";  
        print fields[1];  
        } else {  
        my $err = $csv->error_input;  
        print "Failed to parse line: $err";  
    }  
}  
close CSV;  

I get the following result for every line in the file:

print() on unopened filehandle fields at test2.pl line 16, line 9326.


The overall structure of a solution to your problem appears to be:

  • open cities.txt
  • for each line read from cities.txt
    • open the "$city 1 Minute Moisture Data.txt" file
    • for each line from the moisture file
      • if the line's date falls within range
      • add the line to the save file

You have not specified whether there is a separate save file per city.

Your trial solutions are correctly using the Text::CSV module - that is good. You also need some way of parsing the date values - both the input values (the start and end dates) and the scanned values (from the moisture data). I'd probably use the POSIX::strptime module, but you could use any of a myriad other date and time manipulation modules.

It isn't great Perl - but the code below seems to work when run as:

$ perl scan.pl 1/3/1999 30/4/1999
Boston,1/4/1999,9:31:00 AM,blah, blah, blah  
Boston,1/4/1999,9:32:00 AM,blah, blah, blah  
Boston,1/4/1999,9:33:00 AM,blah, blah, blah
Atlanta,1/4/1999,9:31:00 AM,blah, blah, blah  
Atlanta,1/4/1999,9:32:00 AM,blah, blah, blah  
Atlanta,1/4/1999,9:33:00 AM,blah, blah, blah
Seattle,1/4/1999,9:31:00 AM,blah, blah, blah  
Seattle,1/4/1999,9:32:00 AM,blah, blah, blah  
Seattle,1/4/1999,9:33:00 AM,blah, blah, blah
$ perl scan.pl 1/3/2000 30/4/2000
$

(Given the cities data from the question, and a copy of the example data for each city. I'm assuming normal (eg UK) style dates with the sequence day, month, year. If you are working with American style dates, you have adjustments to make. If you play with invalid dates, you will get errors; the error handling in get_date() is non-existent.)

#!/usr/bin/env perl
use strict;
use warnings;
use POSIX::strptime;
use Text::CSV;

my $cities = "cities.txt";

die "Usage: $0 start-date end-date\n" if scalar(@ARGV) != 2;

my $start = get_date($ARGV[0]);
my $end   = get_date($ARGV[1]);

{
    open my $cfh, "<", $cities or die "Failed to open $cities ($!)";
    while (<$cfh>)
    {
        chomp;
        my $city = $_;
        $city =~ s/\s*,.*//;
        $city =~ s/^\s*//;
        my $moisture = "$city 1 Minute Moisture Data.txt";
        open my $mfh, "<", $moisture or die "Failed to open $moisture ($!)";
        process_file($mfh, $moisture, $city);
    }
}

sub get_date
{
    my($str) = @_;
    my ($mday, $mon, $year) = ( POSIX::strptime($str, '%d/%m/%Y') )[3,4,5];
    return (($year + 1900) * 100 + ($mon + 1)) * 100 + $mday;
}

sub process_file
{
    my($fh, $file, $city) = @_;
    my $csv = Text::CSV->new() or die "Failed to create Text::CSV object";
    my $line = <$fh>;
    die "Unexpected EOF in $file" unless defined $line;
    while ($line = <$fh>)
    {
        chomp $line;
        die "Failed to parse line <<$line>>" unless $csv->parse($line);
        my @columns = $csv->fields();
        die "Insufficient columns in <<$line>>" if scalar(@columns) < 1;
        my $date = get_date($columns[0]);
        print "$city,$line\n" if ($date >= $start && $date <= $end);
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜