how to extract structured informaion from pdf file in java
I need to extract table from pdf file , i know it is not stored in table format but i want to read student result from pdf in java , please help if anyone knows.开发者_StackOverflow中文版... thanks
You should use a PDF parser for that. Check out this list of open source PDF libraries for Java.
SOme PDF files contain PDF structured text (http://www.jpedal.org/PDFblog/2010/09/the-easy-way-to-discover-if-a-pdf-file-contains-structured-content/). If they do not, it is down to the heuristics of the parser to guess this and add structure.
The PdfBox developers did a lot of work on tables but it will never be perfect
精彩评论