开发者

Encoding text stream from PDF to UCS-2 in Objective-C

I'm using CGPDFStringGetBytePtr to get a const char * called str from a CGPDFStringRef popped from a PDF stream. I want to convert str to UCS-2 representation (an expression like ...\303...) using iconv, but I don't know how str was encoded. How I decide this? Or, what was the likely encoding (given that I'm streaming a PDF on a Mac)? I may be missing the wood for the trees.

* EDIT #1.

CFStringRef aStringRef = CGPDFStringCopyTextString(aCGStringRef);
NSString * aString = (NSString *) aStringRef;

const char * bytes = [aString cStringUsingEncoding:NSUTF8StringEncoding];
bytes = [SKTextEncoding convertText:bytes
                         toEncoding:"UCS-2"
                       fromEncoding:"UTF-8"];

NSLog(@"%s", bytes);

* EDIT #2. String and bytes before any conversion, i.e. result of:

              CGPDFStringRef aCGStringRef = NULL;
              CGPDFObjectGetValue(anObjectRef,
                                  kCGPDFObjectTypeString,
                                  &aCGStringRef);
              CFStringRef aStringRef =         
                    CGPDFStringCopyTextString(aCGStringRef);
              NSString * aString = (NSString *) aStringRef;
              const char * bytes = [aString   
                    cStringUsingEncoding:NSUTF8StringEncoding];
              NSLog(@"string: %@____bytes: %s", aString, bytes);

2011-05-25 16:08:00.966 Test[1813:207] string: Æ____bytes: √Ü

2011-05-25 16:08:00.967 Test[1813:207] strin开发者_JAVA百科g: Ï____bytes: √è

2011-05-25 16:08:00.967 Test[1813:207] string: ®__bytes: ¬Æ

2011-05-25 16:08:00.968 Test[1813:207] string: fl____bytes: fl

2011-05-25 16:08:00.968 Test[1813:207] string: ³__bytes: ¬≥

2011-05-25 16:08:00.969 Test[1813:207] string: ã____bytes: √£

2011-05-25 16:08:00.969 Test[1813:207] string: ï____bytes: √Ø

2011-05-25 16:08:00.970 Test[1813:207] string: ³__bytes: ¬≥

2011-05-25 16:08:00.970 Test[1813:207] string: µ____bytes: ¬µ

2011-05-25 16:08:00.971 Test[1813:207] string: Â____bytes: √Ç

2011-05-25 16:08:00.971 Test[1813:207] string: Ü____bytes: √ú


Instead of using CGPDFStringGetBytePtr(), use CGPDFStringCopyTextString(). The latter function returns a CFString object (owned by the caller) that, because of toll-free bridging, can be used as an NSString object.

Being an NSString object, you can send it -cStringUsingEncoding: to get a const char * pointer with the string representation in a given encoding, or -getCString:maxLength:encoding: to store the string representation in a given encoding. For instance, you could get a C string in UTF-8 encoding and then use libiconv to convert it to UCS-2:

CFPDFStringRef pdfString = …;
NSString *str = (NSString *)CGPDFStringCopyTextString(pdfString);
const char *bytes = [str cStringUsingEncoding:NSUTF8StringEncoding];
// use libiconv to convert the string in 'bytes' from UTF-8 to UCS-2
[str release];

Alternatively, you could use the Core Foundation functions for strings. I personally prefer to use their Foundation counterpart classes, though.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜