How to extract an Apache FOP created PDF in C#?
I have a problem in my c# project. I want to ext开发者_StackOverflowract Apache FOP generated PDF files programatically without any 3rd party application. I tried to use many libary like PDFBox, IKVM, PDF2Text, ITextSharp, PDFSharp to extract PDF files, but failed. When i extract a FOP generated PDF to a text file, i get a lots of square symbols and other entangled characters.
My question is, how can i extract a FOP generated PDF file in C#? Is there any library (written to C#), which can do that?
Thanks.
Fonts using Identity-H encoding use directly the glyph indexes for displaying the text on the page. These fonts require a ToUnicode entry in the font dictionary (in the PDF file) in order to support text extraction, otherwise it is not possible. Check the Apache FOP to see if it has a setting for including a ToUnicode entry in the font dictionary or for making the font extraction friendly.
精彩评论