parsing mailbox (mbox or mbx) php
I need to parse mbox or email files using php, that is i would pass a file .mbox or .eml that contains开发者_开发百科 several emails and parse it into its constituents e.g from, to, bcc etc.
is there any library that does it, or any code on how to do this in php?
thansk
There is a PEAR class http://pear.php.net/package/Mail_Mbox for that.
Albeit it's not difficult to separate a .mbox file manually. The individual mails are simply separated by /^From\s/
(may never appear in the mail body) and a block of Headers:
. And most mail applications also store a length field in there. But it's indeed easier to use a readymade script for handling all the variations.
The PEAR class above works for getting individual messages out of MBOX, but if you want to then also parse the message into its constituent elements like "From Address", "Attachments", etc, then I would recommend mime_parser.php
In fact mime_parser.php can handle extracting the messages from a MBOX also, so depending on your needs, you might not need the PEAR class.
Here is the PEAR module Mail_Mbox for parsing mbox data:
https://pear.php.net/manual/en/package.mail.mail-mbox.php
if you need something faster for small needs, like extract all emails you collected into gmail by grouping into label and exported with google takeout in order to import the list let's say to mailchimp...
<?php
// tested with google mail > account > privacy > data exporter (with label)
// https://takeout.google.com/settings/takeout
$raw = file_get_contents('emails.mbox');
preg_match_all('/^Reply-To:\s(.*)$/im', $raw, $matches);
// avoid duplicate
$emails = array_unique($matches[1]);
$filtered_out = '';
// CSV field example (tested with mailchimp)
$filtered_in = 'Email Address' . "\n";
foreach ($emails as $email) {
$email = strtolower($email);
// filter out invalid emails, rarely
// happens that exporters makes mistakes ;)
// for example xxxxxxxxx@gmail.comx.xxxxxxxxxx.org
if (filter_var($email, FILTER_VALIDATE_EMAIL)) {
$filtered_in .= $email . "\n";
} else {
$filtered_out .= $email . "\n";
}
}
header('Content-Type: text/plain');
// save to file
// file_put_contents('emails.csv', $filtered_in);
echo $filtered_in;
?>
hope this help!
精彩评论