Any Multi-Format Document Reading Lib for Python /or C? [closed]
Is there any good Document Parsing Lib , in C or Python? I am trying to Parse Strings from Documents - PDF, Word Doc/Docx , Excel xls/x , PPT, ODF, and also Mac Formats.
Please Recommand Solutions that would also work in Linux/Unix enviorment.
http://wiki.services.openoffice.org/wiki/PyUNO_bridge
so you will use openoffice python api to open/parse documents it supports all openoffice's supported document types
For everone seeking this, I found tika most complete document parsing library. Its not C but its java and Fast enough (when its run inside Nailgun).
tika.apache.org
精彩评论