Perl-script to read and print lines from multiple txt files?
We have 300+ txt files, of which are basically replicates of an email, each txt file has the following format:
To: blabla@hotmail.com
Subject: blabla
From: bla1@hotmail.com
Message: Hello World!
The platform I am to the script on is Windows, and everything is local (including the Perl instance). The aim is to write a script, which crawls through each file (all located within the same directory), and print out a list of each 'unique' email address in the from field. The concept 开发者_如何学Cis very easy.
Can anyone point me in the right direction here? I know how to start off a Perl script, and I am able to read a single file and print all details:
#!/usr/local/bin/perl
open (MYFILE, 'emails/email_id_1.txt');
while (<MYFILE>) {
chomp;
print "$_\n";
}
close (MYFILE);
So now, I need to be able to read and print line 3 of this file, but perform this activity not just once, but for all of the files. I've looked into the File::Find module, could this be of any use?
What platform? If Linux then it's simple:
foreach $f (@ARGS) {
# Do stuff
}
and then call with:
perl mything.pl *.txt
In Windows you'll need to expand the wildcard first as cmd.exe doesn't expand wildcards (unlike Linux shells):
@ARGV = map glob, @ARGV
foreach $f (@ARGS) {
# Do stuff
}
then extracting the third line is just a simple case of reading each line in and counting when you've got to line 3 so you know to print the results.
The glob()
builtin can give you a list of files in a directory:
chdir $dir or die $!;
my @files = glob('*');
You can use Tie::File
to access the 3rd line of a file:
use Tie::File;
for (@files) {
tie my @lines, 'Tie::File', $_ or die $!;
print $lines[2], "\n";
}
Perl one-liner, windows-version:
perl -wE "@ARGV = glob '*.txt'; while (<>) { say $1 if /^From:\s*(.*)/ }"
It will check all the lines, but only print if it finds a valid From: tag.
Are you using a Unix-style shell? You can do this in the shell without even using Perl.
grep "^From:" ./* | sort | uniq -c"
The breakdown is as follows:
- grep will grab every line that starts with "From:", and send it to...
- sort, which will alpha sort those lines, then...
- uniq, which will filter out dupe lines. The "-c" part will count the occurrences.
Your output would look like:
3 From: dave@example.com 5 From: foo@bar.example.com etc...
Possible issues: I'm not sure how complex your "From" lines will be, e.g. multiple addresses, different formats, etc.
You could enhance that grep step in a few ways, or replace it with a Perl script that has less-broad functionality than your proposed all-in-one script.
Please comment if anything isn't clear.
Here's my solution (I hope this isn't homework).
It checks all files in the current directory whose names end with ".txt", case-insensitive (e.g., it will find "foo.TXT", which is probably what you want under Windows). It also allows for possible variations in line terminators (at least CR-LF and LF), and searches for the From:
prefix case-insensitively, and allows arbitrary whitespace after the :
.
#!/usr/bin/perl
use strict;
use warnings;
opendir my $DIR, '.' or die "opendir .: $!\n";
my @files = grep /\.txt$/i, readdir $DIR;
closedir $DIR;
# print "Got ", scalar @files, " files\n";
my %seen = ();
foreach my $file (@files) {
open my $FILE, '<', $file or die "$file: $!\n";
while (<$FILE>) {
if (/^From:\s*(.*)\r?$/i) {
$seen{$1} = 1;
}
}
close $FILE;
}
foreach my $addr (sort keys %seen) {
print "$addr\n";
}
精彩评论