Encoding text stream from PDF to UCS-2 in Objective-C
I'm using CGPDFStringGetBytePtr
to get a const char *
called str
from a CGPDFStringRef
popped from a PDF stream. I want to convert str
to UCS-2
representation (an expression like ...\303...
) using iconv
, but I don't know how str
was encoded. How I decide this? Or, what was the likely encoding (given that I'm streaming a PDF on a Mac)? I may be missing the wood for the trees.
* EDIT #1.
CFStringRef aStringRef = CGPDFStringCopyTextString(aCGStringRef);
NSString * aString = (NSString *) aStringRef;
const char * bytes = [aString cStringUsingEncoding:NSUTF8StringEncoding];
bytes = [SKTextEncoding convertText:bytes
toEncoding:"UCS-2"
fromEncoding:"UTF-8"];
NSLog(@"%s", bytes);
* EDIT #2. String and bytes before any conversion, i.e. result of:
CGPDFStringRef aCGStringRef = NULL;
CGPDFObjectGetValue(anObjectRef,
kCGPDFObjectTypeString,
&aCGStringRef);
CFStringRef aStringRef =
CGPDFStringCopyTextString(aCGStringRef);
NSString * aString = (NSString *) aStringRef;
const char * bytes = [aString
cStringUsingEncoding:NSUTF8StringEncoding];
NSLog(@"string: %@____bytes: %s", aString, bytes);
2011-05-25 16:08:00.966 Test[1813:207] string: Æ____bytes: √Ü
2011-05-25 16:08:00.967 Test[1813:207] strin开发者_JAVA百科g: Ï____bytes: √è
2011-05-25 16:08:00.967 Test[1813:207] string: ®__bytes: ¬Æ
2011-05-25 16:08:00.968 Test[1813:207] string: fl____bytes: fl
2011-05-25 16:08:00.968 Test[1813:207] string: ³__bytes: ¬≥
2011-05-25 16:08:00.969 Test[1813:207] string: ã____bytes: √£
2011-05-25 16:08:00.969 Test[1813:207] string: ï____bytes: √Ø
2011-05-25 16:08:00.970 Test[1813:207] string: ³__bytes: ¬≥
2011-05-25 16:08:00.970 Test[1813:207] string: µ____bytes: ¬µ
2011-05-25 16:08:00.971 Test[1813:207] string: Â____bytes: √Ç
2011-05-25 16:08:00.971 Test[1813:207] string: Ü____bytes: √ú
Instead of using CGPDFStringGetBytePtr()
, use CGPDFStringCopyTextString()
. The latter function returns a CFString
object (owned by the caller) that, because of toll-free bridging, can be used as an NSString
object.
Being an NSString
object, you can send it -cStringUsingEncoding:
to get a const char *
pointer with the string representation in a given encoding, or -getCString:maxLength:encoding:
to store the string representation in a given encoding. For instance, you could get a C string in UTF-8 encoding and then use libiconv to convert it to UCS-2:
CFPDFStringRef pdfString = …;
NSString *str = (NSString *)CGPDFStringCopyTextString(pdfString);
const char *bytes = [str cStringUsingEncoding:NSUTF8StringEncoding];
// use libiconv to convert the string in 'bytes' from UTF-8 to UCS-2
[str release];
Alternatively, you could use the Core Foundation functions for strings. I personally prefer to use their Foundation counterpart classes, though.
精彩评论