CGPDFScanner and Adobe-Japan1
I'm using CGPDFScanner
to extract text from a PDF.
At the time my TJ
operator callback is called, the current font has CIDSystemInfo->Registry
value "Adobe" and CIDSystemInfo->Ordering
value "Japan1". i.e. character set "Adobe-Japan1".
How d开发者_运维百科o I use this fact to convert all the text I've found with the Tj
operator to unicode?
I'm sure I'm not seeing the wood for the trees here.
You can use Adobe's CMAP files to re-map Japan1 to unicode. Also look at the "Supplement" to get the correct file.
http://opensource.adobe.com/wiki/display/cmap/Downloads
精彩评论