开发者

CGPDFScanner, Identity-H and decompression

My instance of CGPDFScanner is scanning a test pdf file.

At a given time, the current font dictionary has Encoding value Identity-H and a FontDescriptor dictionary with key FontFile2. This key happens to be for a stream value, whose dictionary has the key Filter. The value for this key is Fla开发者_Go百科teDecode.

I'm unsure of how to interpret and use this (to, say, extract the text in the next Tj block to Unicode). For example, do I just zlib-decompress the bytes in the next Tj block? (There is no ToUnicode key here.)

I'd thought all the decompression was carried out by the instance of CGPDFScanner.


If the font uses Identity-H encoding and it does not have a ToUnicode entry, the text cannot be extracted. The parameter of Tj operator is a sequence of glyph indexes and this sequence cannot be converted to text in the absence of the ToUnicode entry.

The FontFile2 entry stores the actual font file, it has no role when extracting text from the PDF file.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜