开发者

How to automatically convert email attachment filename to UTF-8 (using Mail_mimeDecode)

I'm using Mail_mimeDecode to extract attachments from incoming emails. Everything was working well for a while, until I started receiving attachments with filenames encoded in KOI8, with a section header like this:

Content-Disposition: attachment; filename="=?KOI8-R?B?8NLJzM/Wxc7JxSAudHh0?="

mimeDecode does a perfectly reasonable thing and returns the filename in KOI8:

$attachmentNameInKOI8 = $part->d_parameters['filename'];

The problem is that I need it in UTF-8. In this specific example, I can run the following to do the conversion:

$attachmentNameInUTF8 = iconv('KOI8', 'UTF-8', $attachmentNameInKOI8);

But without trying to parse the message manually, I don't know when the name is in KOI8 and when it's not. I'm also worried that some other encoding will come through soon, so I need a way to handle anything that might come my way.

I had read that mb_detect_encoding is not reliable, and in fact I could not get it to detect the string as KOI8.

Is there a way to tell mimeDecode to do the translation for me? I looked at the sourcecode of mimeDecode.php:_decodeHeader() and I can see that it parses the encoding but then does nothing with it, which seems a wasted opportunity.

UPDATE: To be clear, this is only a problem with开发者_如何学JAVA headers and not with bodies because mimeDecode exposes the charset of the body, so it's very easy to run iconv yourself like this:

$bodyutf = iconv($textpart->ctype_parameters['charset'], 'UTF-8', $textpart->body);


Adding a line to _decodeHeader() before the replace seems to do the trick:

$text = iconv($charset, 'UTF-8', $text);
$input = str_replace($encoded, $text, $input);

Seems weird that they didn't build some such option into the original class, doesn't it?

NOTE: I've since noticed that Subject lines and other headers can also be encoded the same way as filenames (RFC2047). It appears that adding the iconv line into _decodeHeader addresses all these cases.

Weird that such a feature wasn't already built into mimeDecode--this can't be a rare problem.

EDIT: I now understand that the point of mimeDecode having an option for decode_headers=false is to get the raw values so you can decode them yourself. This seems such a waste given that there's no point to having mimeDecode decode your headers ever if you can't trust that it's going to return a string in an expected charset (it would make more sense for it to accept a charset as a parameter to decode to; or null means no decoding... I have a feeling they're unlikely to change it for little me.) So the point is you need to do your own decoding. Unfortunately it's not as simple as a straight call to imap_utf8() or imap_mime_header_decode(). You could either take the _decodeHeader() function from mimeDecode and modify it or use something like this:

http://www.php.net/manual/en/function.imap-mime-header-decode.php#71762

EDIT #2: Unbelievably, the mimeDecode guys already incorporated my suggestion into their latest svn:

https://pear.php.net/bugs/bug.php?id=18876

On that version, you can now set decode_headers='UTF-8' and mimeDecode will do all the work for you. Wow!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜