Looping through files with perl

2023-01-09 12:19 问答作者：

Okay I have 2 files. One file is data that is updated every 10 minutes while the second is data that was previously used. What I am trying to do is take one line from the new file and loop through each line of the second file and see if it matches one. If it does I dont want to use it, but if there is no match than I want to add it to a string. In what I have done so far it seems that the check does not ever find a match even though there is one. Here is what I have and a sample of the data I have been using from both files. CHECKHAIL and USEDHAIL are the two files

while(my $toBeChecked = <CHECKHAIL>){
        my $found = 0;
        seek USEDHAIL, 0, 0 or die "$0: seek: $!";
        while(my $hailCheck = <USEDHAIL>){
            if( $toBeChecked == $hailCheck){
                $found += 1;
            }
        }
        print USEDHAIL $toBeChecked;
        if ($found == 0){
            $toEmail .= $toBeChecked;
        }
    }
    print $toEmail;
    return;
}

CHECKHAIL sample data

2226  175   2 NE      LAWRENCE           DEADWOOD         SD    44.4    -103.7  (UNR)

2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)

2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)

2350  175   5 N       DANIEL开发者_运维百科S            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

USEDHAIL sample data

2226  175   2 NE      LAWRENCE           DEADWOOD         SD    44.4    -103.7  (UNR)

2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)

It never has an opportunity to succeed because of

while(<USEDHAIL>){
    my $hailCheck = $_;
    if( $toBeChecked eq $hailCheck){
        $found += 1;
    }else{
        return;  ### XXX
    }
}

On the first mismatch, the sub returns to its caller. You may have meant next instead, but for conciseness, you should remove the whole else clause. Remove the other else { return; } (corresponding to when $found is true) for the same reason.

Note that your algorithm has quadratic complexity and will be slow for large inputs. It'd be better to read the used records into a hash and then for each line of CHECKHAIL probe the %used hash to see whether it's been processed.

With those lines removed, I get

$ ./prog.pl 

2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)

2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)

2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

As you can see, that still has a bug. You need to rewind USEDHAIL for each line of CHECKHAIL:

seek USEDHAIL, 0, 0 or die "$0: seek: $!";
while(<USEDHAIL>){
...

This produces

$ ./prog.pl 
2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)
2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

For an example of a better way to do it, consider

#! /usr/bin/perl

use warnings;
use strict;

sub read_used_hail {
  my($path) = @_;

  my %used;

  open my $fh, "<", $path or die "$0: open $path: $!";

  local $" = " ";  # " fix Stack Overflow highlighting
  while (<$fh>) {
    chomp;
    my @f = split " ", $_, 10;
    next unless @f;
    ++$used{"@f"};
  }

  wantarray ? %used : \%used;
}

my %used = read_used_hail "used-hail";
open my $check, "<", "check-hail" or die "$0: open: $!";

while (<$check>) {
  chomp;
  my @f = split " ", $_, 10;
  next if !@f || $used{join " " => @f};
  print $_, "\n";
}

Sample run:

$ ./prog.pl 
2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)
2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

Why wouldn't you just create a hash for the first (used) file?

use strict; 
use warnings;
my %fromUsedFile;
open USEDFILE, '<', '/the/data/file/that/is/10minutesold';
$fromUsedFile{$_}++  while <USEDFILE>;
close USEDFILE;

while ($toBeChecked = <CHECKHAIL>) {
    if (defined $fromUsedFile{$toBeChecked}) {
        # ... line is in both the new and old file
    } else {
        # ... line is only in the new file
        $toBeEmailed .= $toBeChecked;
    }
}

Using $_ within an inner loop can cause problems. Try naming your lines first like so:

while(my $toBeChecked = <CHECKHAIL>){
    my $found = 0;
    while( my $hailCheck = <USEDHAIL>){

Also perl sees numeric comparison and string comparison differently. You're using string comparison instead of numeric comparison:

 if ($found eq 0){

Change to:

 if ($found == 0){

This line sticks out for me:

if ($found eq 0){

Since $found is a boolean, perform boolean tests on it:

if (not $found) {

It also looks like your logic is a bit reversed -- in the first if, you return if the lines do not match, and then in the second if, you return if there was a match. Do you perhaps intend to say next; to skip out of the innermost loop, instead?

继续阅读：file loops perl

Looping through files with perl

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？