Open Source Java Text Parsers
Is there a single Java text parser which can be used to parse Office (windows) documents, OpenOffice documents, and PDFs as well? Else do I need to use something开发者_如何学Python like Apache POI for Word documents and other libraries for OpenOffice and PDFs? If so what are the best options for OpenOffice and PDFs?
Apache Tika:
The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.
Not sure whether this qualifies as "single" for your purposes.
If the task is reading PDF documents, iText is your best bet. For Microsoft Office and OpenOffice (LibreOffice) based documents, POI would be my solution.
精彩评论