how to get text from identity-h encoded from pdf
i succeed to get text from pdf using TJ,Tj operator Callbacks ... but some texts are still miss开发者_如何学编程ing which are identity-h encoded .. how to convert it to text/NSString ???
Identity-H encoding implies a Type0 font (also known as a CID-keyed font), so you have to consult the embedded ToUnicode mapping. The characters you get in TJ, Tj, single quotation and double quotation (the four text-showing operators) are not unicode, but rather arbitrary character IDs that have little meaning outside the current font.
The PDF specification document is very clear, but quite a demanding read.
精彩评论