How can I download IMAP mail attachments over SSL and save them locally using Perl?
I need suggestions on how can I download attachments from my IMAP mails which have attachments and current date in subject line i.e. YYYYMMD开发者_C百科D
format and save the attachments to a local path.
I went through the Perl module Mail::IMAPClient and am able to connect to the IMAP mail server, but need help on other tasks. One more thing to note is that my IMAP sever requires SSL auth.
Also the attachments could be gz, tar or tar.gz files.
A simple program that does what you want is below.
#! /usr/bin/perl
use warnings;
use strict;
The minimum version for Email::MIME
is for when walk_parts
was introduced.
use Email::MIME 1.901;
use IO::Socket::SSL;
use Mail::IMAPClient;
use POSIX qw/ strftime /;
use Term::ReadKey;
You don't want to hardcode your password in your program, do you?
sub read_password {
local $| = 1;
print "Enter password: ";
ReadMode "noecho";
my $password = <STDIN>;
ReadMode "restore";
die "$0: unexpected end of input"
unless defined $password;
print "\n";
chomp $password;
$password;
}
Connect using SSL. We ought to be able to be able to do this with a simple Ssl
parameter to the constructor, but some vendors have chosen to break it in their packages.
my $pw = read_password;
my $imap = Mail::IMAPClient->new(
#Debug => 1,
User => "you\@domain.com",
Password => $pw,
Uid => 1,
Peek => 1, # don't set \Seen flag
Socket => IO::Socket::SSL->new(
Proto => 'tcp',
PeerAddr => 'imap.domain.com',
PeerPort => 993,
),
);
die "$0: connect: $@" if defined $@;
If you want a folder other than the inbox, change it.
$imap->select("INBOX")
or die "$0: select INBOX: ", $imap->LastError, "\n";
Using IMAP search, we look for all messages whose subjects contain today's date in YYYYMMDD format. The date can be anywhere in the subject, so, for example, a subject of "foo bar baz 20100316" would match today.
my $today = strftime "%Y%m%d", localtime $^T;
my @messages = $imap->search(SUBJECT => $today);
die "$0: search: $@" if defined $@;
For each such message, write its attachments to files in the current directory. We write the outermost layer of attachments and do not dig for nested attachments. A part with a name parameter in its content type (as in image/jpeg; name="foo.jpg"
) is assumed to be an attachment, and we ignore all other parts. A saved attachment's name is the following components separated by -
: today's date, its IMAP message ID, a one-based index of its position in the message, and its name.
foreach my $id (@messages) {
die "$0: funky ID ($id)" unless $id =~ /\A\d+\z/;
my $str = $imap->message_string($id)
or die "$0: message_string: $@";
my $n = 1;
Email::MIME->new($str)->walk_parts(sub {
my($part) = @_;
return unless ($part->content_type =~ /\bname=([^"]+)/
or $part->content_type =~ /\bname="([^"]+)"/); # " grr...
my $name = "./$today-$id-" . $n++ . "-$1";
print "$0: writing $name...\n";
open my $fh, ">", $name
or die "$0: open $name: $!";
print $fh $part->content_type =~ m!^text/!
? $part->body_str
: $part->body
or die "$0: print $name: $!";
close $fh
or warn "$0: close $name: $!";
});
}
If you want to stick with Mail::IMAPClient, you can tell it to use SSL.
Alternatively, Net::IMAP::Simple::SSL could also help you with that. The interface is the same as the one provided by Net::IMAP::Simple.
Once you have the message, Parsing emails with attachments shows how to extract attachments. I haven't tried it, but my hunch is that using Email::MIME::walk_parts can be used to significantly simplify the script shown in that PerlMonks article.
I have changed a little my approach to download attachments from @Greg, since it was shown unreliable to download SAP XML attachments. They do not follow the Content-Type: application/pdf; name=XXXXX
standard so, it gave me a lot of problems. Example:
Content-ID: <payload-xxxxxxxxxxxxx@sap.com>
Content-Disposition: attachment;
filename="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.xml"
Content-Type: application/xml
Content-Descripton: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.xml
The rest of the program remains almost the same. The difference is that i´m now using MIME::Parser
to retrieve all the message, and i throw away all that is body and image related. I also removed the Peek => 1
since i wanted to mark the messages as read after they got downloaded(and only navigate on unread messages). Log::Logger
helped to create a centralized log:
--- Snippet 1 --- Libs
#! /usr/bin/perl
use warnings;
use strict;
use Mail::IMAPClient; #IMAP connection
use Log::Logger; #Logging facility
use MIME::Parser; #Mime "slicer"
use DateTime; #Date
use File::Copy; #File manipulation
use File::Path qw( mkpath );
--- Snippet 2 --- Log initialization
$log_script = new Log::Logger;
$log_script->open_append("/var/log/downloader.log");
my $dt = DateTime->now;
$dt->set_time_zone('America/Sao_Paulo');
$hour = (join ' ', $dt->ymd, $dt->hms);
--- Snippet 3 --- Mail downloader
$imap->select($remote_dir) or ($log_script->log("$hour: Account $account, Dir $remote_dir. Check if this folder exists") and next);
# Select unseen messages only
my @mails = ($imap->unseen);
foreach my $id (@mails) {
my $subject = $imap->subject($id);
my $str = $imap->message_string($id) or ($log_script->log("$hour: Account $account, Email \<$subject\> with problems. Crawling through next email") and next);
my $parser = MIME::Parser->new();
$parser->output_dir( $temp_dir );
$parser->parse_data( $str );
opendir(DIR, $temp_dir);
foreach $file (readdir(DIR)) {
next unless (-f "$temp_dir/$file");
if ("$file" =~ /^msg/i){ # ignores body
$body .= "$file ";
unlink "$temp_dir/$file";
} elsif (("$file" =~ /jpg$/i) # ignores signature images
or ("$file" =~ /gif$/i)
or ("$file" =~ /png$/i)) {
$body .= "$file ";
unlink "$temp_dir/$file";
} else { # move attachments to destination dir
$log_script->log("$hour: Account: $account, File $file, Email \<$subject\>, saved $local_dir");
move "$temp_dir/$file", "$local_dir";
};
};
$log_script->log("$hour: Files from email \<$subject\> ignored as they are body related stuff: $body") if $body;
I prefer the Mail::IMAPClient approach outlined by Greg, but it is essential to binmode() the output filehandle, namely to prevent Windows from assuming 0x0A bytes to be linefeeds and replacing them by CRLFs and so invalidating binary files. I'm sorry to disguise this as an answer, comments would be appropriate, but I don't own any reputation by now.
精彩评论