Perl-script to read and print lines from multiple txt files?

2023-03-30 16:05 问答作者：

We have 300+ txt files, of which are basically replicates of an email, each txt file has the following format:

To: blabla@hotmail.com 
Subject: blabla 
From: bla1@hotmail.com 
Message: Hello World!

The platform I am to the script on is Windows, and everything is local (including the Perl instance). The aim is to write a script, which crawls through each file (all located within the same directory), and print out a list of each 'unique' email address in the from field. The concept 开发者_如何学Cis very easy.

Can anyone point me in the right direction here? I know how to start off a Perl script, and I am able to read a single file and print all details:

 #!/usr/local/bin/perl
 open (MYFILE, 'emails/email_id_1.txt');
 while (<MYFILE>) {
    chomp;
    print "$_\n";
 }
 close (MYFILE);

So now, I need to be able to read and print line 3 of this file, but perform this activity not just once, but for all of the files. I've looked into the File::Find module, could this be of any use?

What platform? If Linux then it's simple:

foreach $f (@ARGS) {    
    # Do stuff 
}

and then call with:

perl mything.pl *.txt

In Windows you'll need to expand the wildcard first as cmd.exe doesn't expand wildcards (unlike Linux shells):

@ARGV = map glob, @ARGV

foreach $f (@ARGS) {
    # Do stuff
}

then extracting the third line is just a simple case of reading each line in and counting when you've got to line 3 so you know to print the results.

The glob() builtin can give you a list of files in a directory:

chdir $dir or die $!;
my @files = glob('*');

You can use Tie::File to access the 3rd line of a file:

use Tie::File;

for (@files) {
    tie my @lines, 'Tie::File', $_ or die $!;
    print $lines[2], "\n";         
}

Perl one-liner, windows-version:

perl -wE "@ARGV = glob '*.txt'; while (<>) { say $1 if /^From:\s*(.*)/ }"

It will check all the lines, but only print if it finds a valid From: tag.

Are you using a Unix-style shell? You can do this in the shell without even using Perl.

grep "^From:" ./* | sort | uniq -c"

The breakdown is as follows:

grep will grab every line that starts with "From:", and send it to...
sort, which will alpha sort those lines, then...
uniq, which will filter out dupe lines. The "-c" part will count the occurrences.

Your output would look like:

    3 From: dave@example.com
    5 From: foo@bar.example.com
    etc...

Possible issues: I'm not sure how complex your "From" lines will be, e.g. multiple addresses, different formats, etc.

You could enhance that grep step in a few ways, or replace it with a Perl script that has less-broad functionality than your proposed all-in-one script.

Please comment if anything isn't clear.

Here's my solution (I hope this isn't homework).

It checks all files in the current directory whose names end with ".txt", case-insensitive (e.g., it will find "foo.TXT", which is probably what you want under Windows). It also allows for possible variations in line terminators (at least CR-LF and LF), and searches for the From: prefix case-insensitively, and allows arbitrary whitespace after the :.

#!/usr/bin/perl

use strict;
use warnings;

opendir my $DIR, '.' or die "opendir .: $!\n";
my @files = grep /\.txt$/i, readdir $DIR;
closedir $DIR;
# print "Got ", scalar @files, " files\n";

my %seen = ();
foreach my $file (@files) {
    open my $FILE, '<', $file or die "$file: $!\n";
    while (<$FILE>) {
        if (/^From:\s*(.*)\r?$/i) {
            $seen{$1} = 1;
        }
    }
    close $FILE;
}

foreach my $addr (sort keys %seen) {
    print "$addr\n";
}

继续阅读：file perl

Perl-script to read and print lines from multiple txt files?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？