Parse comma delimited file based on a start and end date

2023-02-18 05:38 问答作者：

I am new to perl so please accept my apologies if my question is trivial. I have a very large file with data that looks like the following:

Date, Time, Data1, Data2, Data3  
1/4/1999,9:31:00 AM,blah, blah, blah  
1/4/1999,9:32:00 AM,blah, blah, blah  
1/4/1999,9:33:00 AM,blah, blah, blah

I have a file named 'cities.txt' which has a list of cities located on different rows with a comma at the end of the row.

i.e.

Boston,  
Atlanta,  
Seattle,

Each city has its own file in that same directory that has the following naming convention 'Boston 1 Minute Moisture Data.txt'. I want to first read the 'cities.txt' file and for each city that appears in that file find the associated moisture data file and extract all the data (r开发者_如何学Pythonows) between and including TWO sets of dates (a START and an END date) and SAVE that to another file. The date is located in the first column.

I have read through comments made in the following post but I am still very confused.

How do I efficiently parse a CSV file in Perl?

I wrote a simple script using some examples online. Firstly, I just wanted to see if I was using the module correctly. So all I wanted to do was to get the parser to parse the fields and calculate the sum of a specific column.

#!/usr/bin/perl  
use strict;  
use warnings;  

use Text::CSV_XS;  
my $csv = Text::CSV_XS->new();  

my $file = 'Boston 1 Minute Moisture Data.csv';  

my $sum = 0;  
open(my $data, '<', $file) or die "Could not open '$file'\n";  
while (my $line = <$data>) {  
    chomp $line;  

    if ($csv->parse($line)) {  
        my @columns = $csv->fields();  

        $sum += $columns[3];  
    } else {  
        warn "Line could not be parsed: $line\n";  
    }  
}  
print "$sum\n";

The result is get is "Line could not be parsed: $line\n". For some reason the parser isn't parsing the fields. Any ideas?

I also tried the following code:

#!/usr/bin/perl  
use strict;  
use warnings;  
use Text::CSV;  

my $file = 'Boston 1 Minute Moisture Data.csv';  
my $csv = Text::CSV->new();  

open (CSV, "<", $file) or die $!;  

while (<CSV>) {  
    if ($csv->parse($_)) {  
        my @columns = $csv->fields();  
        #print "@columns\n";  
        print fields[1];  
        } else {  
        my $err = $csv->error_input;  
        print "Failed to parse line: $err";  
    }  
}  
close CSV;

I get the following result for every line in the file:

print() on unopened filehandle fields at test2.pl line 16, line 9326.

The overall structure of a solution to your problem appears to be:

open cities.txt
for each line read from cities.txt
- open the "$city 1 Minute Moisture Data.txt" file
- for each line from the moisture file
  - if the line's date falls within range
  - add the line to the save file

You have not specified whether there is a separate save file per city.

Your trial solutions are correctly using the Text::CSV module - that is good. You also need some way of parsing the date values - both the input values (the start and end dates) and the scanned values (from the moisture data). I'd probably use the POSIX::strptime module, but you could use any of a myriad other date and time manipulation modules.

It isn't great Perl - but the code below seems to work when run as:

$ perl scan.pl 1/3/1999 30/4/1999
Boston,1/4/1999,9:31:00 AM,blah, blah, blah  
Boston,1/4/1999,9:32:00 AM,blah, blah, blah  
Boston,1/4/1999,9:33:00 AM,blah, blah, blah
Atlanta,1/4/1999,9:31:00 AM,blah, blah, blah  
Atlanta,1/4/1999,9:32:00 AM,blah, blah, blah  
Atlanta,1/4/1999,9:33:00 AM,blah, blah, blah
Seattle,1/4/1999,9:31:00 AM,blah, blah, blah  
Seattle,1/4/1999,9:32:00 AM,blah, blah, blah  
Seattle,1/4/1999,9:33:00 AM,blah, blah, blah
$ perl scan.pl 1/3/2000 30/4/2000
$

(Given the cities data from the question, and a copy of the example data for each city. I'm assuming normal (eg UK) style dates with the sequence day, month, year. If you are working with American style dates, you have adjustments to make. If you play with invalid dates, you will get errors; the error handling in get_date() is non-existent.)

#!/usr/bin/env perl
use strict;
use warnings;
use POSIX::strptime;
use Text::CSV;

my $cities = "cities.txt";

die "Usage: $0 start-date end-date\n" if scalar(@ARGV) != 2;

my $start = get_date($ARGV[0]);
my $end   = get_date($ARGV[1]);

{
    open my $cfh, "<", $cities or die "Failed to open $cities ($!)";
    while (<$cfh>)
    {
        chomp;
        my $city = $_;
        $city =~ s/\s*,.*//;
        $city =~ s/^\s*//;
        my $moisture = "$city 1 Minute Moisture Data.txt";
        open my $mfh, "<", $moisture or die "Failed to open $moisture ($!)";
        process_file($mfh, $moisture, $city);
    }
}

sub get_date
{
    my($str) = @_;
    my ($mday, $mon, $year) = ( POSIX::strptime($str, '%d/%m/%Y') )[3,4,5];
    return (($year + 1900) * 100 + ($mon + 1)) * 100 + $mday;
}

sub process_file
{
    my($fh, $file, $city) = @_;
    my $csv = Text::CSV->new() or die "Failed to create Text::CSV object";
    my $line = <$fh>;
    die "Unexpected EOF in $file" unless defined $line;
    while ($line = <$fh>)
    {
        chomp $line;
        die "Failed to parse line <<$line>>" unless $csv->parse($line);
        my @columns = $csv->fields();
        die "Insufficient columns in <<$line>>" if scalar(@columns) < 1;
        my $date = get_date($columns[0]);
        print "$city,$line\n" if ($date >= $start && $date <= $end);
    }
}

继续阅读：csv parsing perl text

Parse comma delimited file based on a start and end date

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？