How to automatically convert email attachment filename to UTF-8 (using Mail_mimeDecode)
I'm using Mail_mimeDecode to extract attachments from incoming emails. Everything was working well for a while, until I started receiving attachments with filenames encoded in KOI8, with a section header like this:
Content-Disposition: attachment; filename="=?KOI8-R?B?8NLJzM/Wxc7JxSAudHh0?="
mimeDecode does a perfectly reasonable thing and returns the filename in KOI8:
$attachmentNameInKOI8 = $part->d_parameters['filename'];
The problem is that I need it in UTF-8. In this specific example, I can run the following to do the conversion:
$attachmentNameInUTF8 = iconv('KOI8', 'UTF-8', $attachmentNameInKOI8);
But without trying to parse the message manually, I don't know when the name is in KOI8 and when it's not. I'm also worried that some other encoding will come through soon, so I need a way to handle anything that might come my way.
I had read that mb_detect_encoding is not reliable, and in fact I could not get it to detect the string as KOI8.
Is there a way to tell mimeDecode to do the translation for me? I looked at the sourcecode of mimeDecode.php:_decodeHeader() and I can see that it parses the encoding but then does nothing with it, which seems a wasted opportunity.
UPDATE: To be clear, this is only a problem with开发者_如何学JAVA headers and not with bodies because mimeDecode exposes the charset of the body, so it's very easy to run iconv yourself like this:
$bodyutf = iconv($textpart->ctype_parameters['charset'], 'UTF-8', $textpart->body);
Adding a line to _decodeHeader() before the replace seems to do the trick:
$text = iconv($charset, 'UTF-8', $text);
$input = str_replace($encoded, $text, $input);
Seems weird that they didn't build some such option into the original class, doesn't it?
NOTE: I've since noticed that Subject lines and other headers can also be encoded the same way as filenames (RFC2047). It appears that adding the iconv line into _decodeHeader addresses all these cases.
Weird that such a feature wasn't already built into mimeDecode--this can't be a rare problem.
EDIT: I now understand that the point of mimeDecode having an option for decode_headers=false is to get the raw values so you can decode them yourself. This seems such a waste given that there's no point to having mimeDecode decode your headers ever if you can't trust that it's going to return a string in an expected charset (it would make more sense for it to accept a charset as a parameter to decode to; or null means no decoding... I have a feeling they're unlikely to change it for little me.) So the point is you need to do your own decoding. Unfortunately it's not as simple as a straight call to imap_utf8() or imap_mime_header_decode(). You could either take the _decodeHeader() function from mimeDecode and modify it or use something like this:
http://www.php.net/manual/en/function.imap-mime-header-decode.php#71762
EDIT #2: Unbelievably, the mimeDecode guys already incorporated my suggestion into their latest svn:
https://pear.php.net/bugs/bug.php?id=18876
On that version, you can now set decode_headers='UTF-8' and mimeDecode will do all the work for you. Wow!
精彩评论