Encoding text stream from PDF to UCS-2 in Objective-C

2023-03-08 09:28 问答作者：

I'm using CGPDFStringGetBytePtr to get a const char * called str from a CGPDFStringRef popped from a PDF stream. I want to convert str to UCS-2 representation (an expression like ...\303...) using iconv, but I don't know how str was encoded. How I decide this? Or, what was the likely encoding (given that I'm streaming a PDF on a Mac)? I may be missing the wood for the trees.

* EDIT #1.

CFStringRef aStringRef = CGPDFStringCopyTextString(aCGStringRef);
NSString * aString = (NSString *) aStringRef;

const char * bytes = [aString cStringUsingEncoding:NSUTF8StringEncoding];
bytes = [SKTextEncoding convertText:bytes
                         toEncoding:"UCS-2"
                       fromEncoding:"UTF-8"];

NSLog(@"%s", bytes);

* EDIT #2. String and bytes before any conversion, i.e. result of:

              CGPDFStringRef aCGStringRef = NULL;
              CGPDFObjectGetValue(anObjectRef,
                                  kCGPDFObjectTypeString,
                                  &aCGStringRef);
              CFStringRef aStringRef =         
                    CGPDFStringCopyTextString(aCGStringRef);
              NSString * aString = (NSString *) aStringRef;
              const char * bytes = [aString   
                    cStringUsingEncoding:NSUTF8StringEncoding];
              NSLog(@"string: %@____bytes: %s", aString, bytes);

2011-05-25 16:08:00.966 Test[1813:207] string: Æ____bytes: √Ü

2011-05-25 16:08:00.967 Test[1813:207] strin开发者_JAVA百科g: Ï____bytes: √è

2011-05-25 16:08:00.967 Test[1813:207] string: ®__bytes: ¬Æ

2011-05-25 16:08:00.968 Test[1813:207] string: ﬂ____bytes: Ô¨Ç

2011-05-25 16:08:00.968 Test[1813:207] string: ³__bytes: ¬≥

2011-05-25 16:08:00.969 Test[1813:207] string: ã____bytes: √£

2011-05-25 16:08:00.969 Test[1813:207] string: ï____bytes: √Ø

2011-05-25 16:08:00.970 Test[1813:207] string: ³__bytes: ¬≥

2011-05-25 16:08:00.970 Test[1813:207] string: µ____bytes: ¬µ

2011-05-25 16:08:00.971 Test[1813:207] string: Â____bytes: √Ç

2011-05-25 16:08:00.971 Test[1813:207] string: Ü____bytes: √ú

Instead of using CGPDFStringGetBytePtr(), use CGPDFStringCopyTextString(). The latter function returns a CFString object (owned by the caller) that, because of toll-free bridging, can be used as an NSString object.

Being an NSString object, you can send it -cStringUsingEncoding: to get a const char * pointer with the string representation in a given encoding, or -getCString:maxLength:encoding: to store the string representation in a given encoding. For instance, you could get a C string in UTF-8 encoding and then use libiconv to convert it to UCS-2:

CFPDFStringRef pdfString = …;
NSString *str = (NSString *)CGPDFStringCopyTextString(pdfString);
const char *bytes = [str cStringUsingEncoding:NSUTF8StringEncoding];
// use libiconv to convert the string in 'bytes' from UTF-8 to UCS-2
[str release];

Alternatively, you could use the Core Foundation functions for strings. I personally prefer to use their Foundation counterpart classes, though.

继续阅读：objective-c pdf

Encoding text stream from PDF to UCS-2 in Objective-C

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？