Choosing a cross-platform library for PDF rendering and analysis app (preferably using C#)
I am planning to write an app that can open and display PDF documents, and perform OCR on vector graphic elements within the PDFs. The user must be able to select regions of the document and I need to draw real-time annotations on the document. I don't need to alter or save the documen开发者_如何学运维t itself.
I have plenty of experience with C# and WPF; I have written a similar application already that does the above on XPS/XAML documents rather than PDF. However that app only runs on Windows and PDF documents must be converted to XPS first.
I have done quite a bit of research and there are many, many options available, none of which seem an obvious choice. There are many libraries that can open PDFs or create PDFs, but most don't seem to give you access to individual vector graphic elements in a format that lets you draw/manipulate them on the screen (similar to what I could do with WPF graphic elements extracted from XPS documents).
I am familiar with .Net and C# (including .Net 2 GDI+ graphics) and I am very keen to stick to what I know. I am also using EmguCV for image recognition which can be compiled in Mono or .Net. As such I am looking at Silverlight (running standalone) or Mono options, both of which should run on PC and Mac.
Performance (for both graphics and number crunching) is a strong consideration, though I am just as interested in getting this up and running quickly.
Does anyone have any experience with opening PDFs, extracting vector graphic elements (perhaps as SVG) and rendering them in a Mono app? Can individual elements be rendered to bitmap?
Alternatively, does anyone have experience with opening PDFs in Silverlight and converting them to XPS or XAML at runtime? I know that WPF and Silverlight graphics libraries are not 1:1, but I'm not sure how this affects XPS contents (generally composed of Canvas, Glyphs and StreamGeometry objects).
Thank you for any advice, tips or links you have to share.
look at this http://silverpdf.codeplex.com/
it's client side pdf reading library. actually right now it can only read files, but you could play with it and make your own "display" functionality.
You might want to examine the internals of your PDFs so you understand what they actually contain better - you might be very surprised! For example, text can often be scanned pages or images and vecotr graphics do not exist as neat little packages. We wrote a whole load of general articles about what is inside a PDF and analysis tools at http://www.jpedal.org/PDFblog which are not specific to any tool or language.
精彩评论