Looping through files with perl
Okay I have 2 files. One file is data that is updated every 10 minutes while the second is data that was previously used. What I am trying to do is take one line from the new file and loop through each line of the second file and see if it matches one. If it does I dont want to use it, but if there is no match than I want to add it to a string. In what I have done so far it seems that the check does not ever find a match even though there is one. Here is what I have and a sample of the data I have been using from both files. CHECKHAIL and USEDHAIL are the two files
while(my $toBeChecked = <CHECKHAIL>){
my $found = 0;
seek USEDHAIL, 0, 0 or die "$0: seek: $!";
while(my $hailCheck = <USEDHAIL>){
if( $toBeChecked == $hailCheck){
$found += 1;
}
}
print USEDHAIL $toBeChecked;
if ($found == 0){
$toEmail .= $toBeChecked;
}
}
print $toEmail;
return;
}
CHECKHAIL sample data
2226 175 2 NE LAWRENCE DEADWOOD SD 44.4 -103.7 (UNR)
2305 200 2 S SISKIYOU GREENVIEW CA 41.52 -122.9 2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)
2350 200 DANIELS E FLAXVILLE MT 48.8 -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)
2350 175 5 N DANIEL开发者_运维百科S RICHLAND MT 48.89 -106.05 DESTROYED CROPS (GGW)
USEDHAIL sample data
2226 175 2 NE LAWRENCE DEADWOOD SD 44.4 -103.7 (UNR)
2305 200 2 S SISKIYOU GREENVIEW CA 41.52 -122.9 2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)
It never has an opportunity to succeed because of
while(<USEDHAIL>){
my $hailCheck = $_;
if( $toBeChecked eq $hailCheck){
$found += 1;
}else{
return; ### XXX
}
}
On the first mismatch, the sub returns to its caller. You may have meant next
instead, but for conciseness, you should remove the whole else
clause. Remove the other else { return; }
(corresponding to when $found
is true) for the same reason.
Note that your algorithm has quadratic complexity and will be slow for large inputs. It'd be better to read the used records into a hash and then for each line of CHECKHAIL
probe the %used
hash to see whether it's been processed.
With those lines removed, I get
$ ./prog.pl 2305 200 2 S SISKIYOU GREENVIEW CA 41.52 -122.9 2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR) 2350 200 DANIELS E FLAXVILLE MT 48.8 -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW) 2350 175 5 N DANIELS RICHLAND MT 48.89 -106.05 DESTROYED CROPS (GGW)
As you can see, that still has a bug. You need to rewind USEDHAIL
for each line of CHECKHAIL
:
seek USEDHAIL, 0, 0 or die "$0: seek: $!";
while(<USEDHAIL>){
...
This produces
$ ./prog.pl 2350 200 DANIELS E FLAXVILLE MT 48.8 -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW) 2350 175 5 N DANIELS RICHLAND MT 48.89 -106.05 DESTROYED CROPS (GGW)
For an example of a better way to do it, consider
#! /usr/bin/perl
use warnings;
use strict;
sub read_used_hail {
my($path) = @_;
my %used;
open my $fh, "<", $path or die "$0: open $path: $!";
local $" = " "; # " fix Stack Overflow highlighting
while (<$fh>) {
chomp;
my @f = split " ", $_, 10;
next unless @f;
++$used{"@f"};
}
wantarray ? %used : \%used;
}
my %used = read_used_hail "used-hail";
open my $check, "<", "check-hail" or die "$0: open: $!";
while (<$check>) {
chomp;
my @f = split " ", $_, 10;
next if !@f || $used{join " " => @f};
print $_, "\n";
}
Sample run:
$ ./prog.pl 2350 200 DANIELS E FLAXVILLE MT 48.8 -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW) 2350 175 5 N DANIELS RICHLAND MT 48.89 -106.05 DESTROYED CROPS (GGW)
Why wouldn't you just create a hash for the first (used) file?
use strict;
use warnings;
my %fromUsedFile;
open USEDFILE, '<', '/the/data/file/that/is/10minutesold';
$fromUsedFile{$_}++ while <USEDFILE>;
close USEDFILE;
while ($toBeChecked = <CHECKHAIL>) {
if (defined $fromUsedFile{$toBeChecked}) {
# ... line is in both the new and old file
} else {
# ... line is only in the new file
$toBeEmailed .= $toBeChecked;
}
}
Using $_ within an inner loop can cause problems. Try naming your lines first like so:
while(my $toBeChecked = <CHECKHAIL>){
my $found = 0;
while( my $hailCheck = <USEDHAIL>){
Also perl sees numeric comparison and string comparison differently. You're using string comparison instead of numeric comparison:
if ($found eq 0){
Change to:
if ($found == 0){
This line sticks out for me:
if ($found eq 0){
Since $found
is a boolean, perform boolean tests on it:
if (not $found) {
It also looks like your logic is a bit reversed -- in the first if
, you return if the lines do not match, and then in the second if
, you return if there was a match. Do you perhaps intend to say next;
to skip out of the innermost loop, instead?
精彩评论