开发者

how to extract structured informaion from pdf file in java

I need to extract table from pdf file , i know it is not stored in table format but i want to read student result from pdf in java , please help if anyone knows.开发者_StackOverflow中文版... thanks


You should use a PDF parser for that. Check out this list of open source PDF libraries for Java.


SOme PDF files contain PDF structured text (http://www.jpedal.org/PDFblog/2010/09/the-easy-way-to-discover-if-a-pdf-file-contains-structured-content/). If they do not, it is down to the heuristics of the parser to guess this and add structure.

The PdfBox developers did a lot of work on tables but it will never be perfect

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜