开发者

how to get text from identity-h encoded from pdf

i succeed to get text from pdf using TJ,Tj operator Callbacks ... but some texts are still miss开发者_如何学编程ing which are identity-h encoded .. how to convert it to text/NSString ???


Identity-H encoding implies a Type0 font (also known as a CID-keyed font), so you have to consult the embedded ToUnicode mapping. The characters you get in TJ, Tj, single quotation and double quotation (the four text-showing operators) are not unicode, but rather arbitrary character IDs that have little meaning outside the current font.

The PDF specification document is very clear, but quite a demanding read.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜