Converting PDF into workable text using C# [closed]

2022-12-10 17:25 问答作者：

Closed. This question needs to be more focused. It is not currently accepting answers.

开发者_StackOverflow中文版

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 5 years ago.

Improve this question

Is there a library that has a class to extract the text from a pdf file in c#.net? I've tried a few but documentation is terrible, so I haven't been able to get it off the ground. Also if it provides a class to extract images that would be a plus. Any suggestions? Thx in advance.

Also I need to be able to implement it into an existing application.

Have you tried PDFKit.NET? It has reasonable docs and some good examples. It is designed for a server environment, so it is a little expensive.

EDIT Here is an open source library on SourceForge called iTextSharp. It is free for open source projects. I haven't used it, but it looks promising. Here is a tutorial for it that has lots of code examples.

There are a couple of ways you can go here -- a lot of it will depend on whether you want to retain the formattting (i.e., paragraphs and other layout elements) of the original PDF.

If you're considering commercial solutions, we do offer two products that might meet your requirements. One is EasyPDF SDK which has single shot ExtractText() and ExtractText2() calls that pull text out of your PDFs as plain text.

Note that the output from these calls is pretty simplistic and you will lose a lot of the original layout elements. They're nice for simple text extraction but might not be great if your PDF contains tabular data.

If you're dealing with tables, a nicer alternative could be to pull it out as rich text instead. We a have a tool called EasyConverter SDK geared for business documents which does just that using a single function call.

With EasyConverter SDK, the layout of your original PDF will be retained.

Both support C# so feel free to check out the eval versions at www.pdfonline.com if you're interested. I do work for the vendor so do take this suggestion as kind of a mother loving her own child :-) I've been browsing stackoverflow.com for code snippets for a long time, but have only recently started posting, so if you have any questions with either API just let me know and I can help. Cheers!

Docotic.Pdf library can extract text and images from PDF files.

You can extract text from whole document of from some pages only. The library can extract plain text and also text chunks with coordinates.

You can extracted images from PDFs (as JPEG and TIFF files).

Here is a couple of samples for your task:

Extract text from PDFs
Extract images from a PDF

Disclaimer: I work for Bit Miracle, vendor of the library.

we've used snowbound software at work for image conversion. it apparently supports text extraction too. however, it's not free.

继续阅读：image-extraction pdf text-extraction

Converting PDF into workable text using C# [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？