Extracting text from PDF document - C# [duplicate]

2022-12-20 13:37 问答作者：

This question already has answers here: Extracting text from PDFs in C# [closed] (6 answers) Closed 3 years ago.

Is there a reliable way to extract text from PDF? The first thought that comes to mind is that PDF开发者_开发问答 may have multiple columns and the extraction mechanism needs to know the logical structure somehow. I understand that some PDF docs are "tagged" but I'd need to support pretty much any PDF document.

Any third party components to the rescue here?

Please see: Extracting text from PDFs in C#

Some PDFs are scans, so OCR would be required (not easy, to say the least).

Some PDFs are compressed, others (more rarely) are bare PDFs.

The PDF file format itself is well-documented, but when it comes to extracting the right "structure" from anything but a simple one-column document, you're asking for a tall order. PDF sort of represents, internally, how HTML might look if every line of text was positioned in DIVs with absolute positioning.

继续阅读：.net asp.net pdf

Extracting text from PDF document - C# [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？