Decoding word-encoded Content-Disposition header file name in Objective-C
I am trying to retrieve a file name that can't be represented in ASCII from the content-d开发者_开发问答isposition header.
This file name is word-encoded. Below is the encoded file name:
=?UTF-8?Q?=C3=ABst=C3=A9_=C3=A9_=C3=BAm_n=C3=B4m=C3=A9?= =?UTF-8?Q?_a=C3=A7ent=C3=BAad=C3=B5.xlsx?=
How do I get the decoded file name (that actually is "ësté é úm nômé açentúadõ.xlsx")?
PS: I am looking for an Objective-C implementation.
You probably want to search for a MIME handling framework, but I searched online and came up with nothing, so....
I couldn't find an example online, so I'm just showing the algorithm here. It's not the best example since I'm making a big assumption. That being that the string is always UTF-8 Q-encoded.
Q-encoding is like URL-encoding (percent-encoding), which Foundation's NSString
already has support for decoding. The only (practical) difference when decoding (there are bigger differences when encoding) is that %
encodings are =
encodings instead.
Then there's the lead-in and lead-out stuff. Each encoded block has the format =?charset-name?encoding-type? ... encoded string here ... ?=
. You should really read the charset name is use that encoding, and you should really read the encoding-type, since it may be "Q" or "B" (Base64).
This example only works for Q-encoding (a subset of quoted-printable). You should be able to easily modify it to handle the different charsets and to handle Base64 encoding however.
#import <Foundation/Foundation.h>
int main(void) {
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
NSString *encodedString = @"=?UTF-8?Q?=C3=ABst=C3=A9_=C3=A9_=C3=BAm_n=C3=B4m=C3=A9?= =?UTF-8?Q?_a=C3=A7ent=C3=BAad=C3=B5.xlsx?=";
NSScanner *scanner = [NSScanner scannerWithString:encodedString];
NSString *buf = nil;
NSMutableString *decodedString = [[NSMutableString alloc] init];
while ([scanner scanString:@"=?UTF-8?Q?" intoString:NULL]
|| ([scanner scanUpToString:@"=?UTF-8?Q?" intoString:&buf] && [scanner scanString:@"=?UTF-8?Q?" intoString:NULL])) {
if (buf != nil) {
[decodedString appendString:buf];
}
buf = nil;
NSString *encodedRange;
if (![scanner scanUpToString:@"?=" intoString:&encodedRange]) {
break; // Invalid encoding
}
[scanner scanString:@"?=" intoString:NULL]; // Skip the terminating "?="
// Decode the encoded portion (naively using UTF-8 and assuming it really is Q encoded)
// I'm doing this really naively, but it should work
// Firstly I'm encoding % signs so I can cheat and turn this into a URL-encoded string, which NSString can decode
encodedRange = [encodedRange stringByReplacingOccurrencesOfString:@"%" withString:@"=25"];
// Turn this into a URL-encoded string
encodedRange = [encodedRange stringByReplacingOccurrencesOfString:@"=" withString:@"%"];
// Remove the underscores
encodedRange = [encodedRange stringByReplacingOccurrencesOfString:@"_" withString:@" "];
[decodedString appendString:[encodedRange stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding]];
}
NSLog(@"Decoded string = %@", decodedString);
[decodedString release];
[pool drain];
return 0;
}
This outputs:
chrisbook-pro:~ chris$ ./qp-decode 2010-12-01 18:54:42.903 qp-decode[9643:903] Decoded string = ësté é úm nômé açentúadõ.xlsx
Created an easier / successful method here using a trick involving NSString percent escapes..
https://stackoverflow.com/a/10888548/285694
I recently implemented a NSString category that decodes MIME Encoded-Word with either Q-encoding or B-encoding.
The code is available on GitHub and is briefly explained in this answer.
精彩评论