开发者

Any Multi-Format Document Reading Lib for Python /or C? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. 开发者_如何学运维 Closed 9 years ago.

Is there any good Document Parsing Lib , in C or Python? I am trying to Parse Strings from Documents - PDF, Word Doc/Docx , Excel xls/x , PPT, ODF, and also Mac Formats.

Please Recommand Solutions that would also work in Linux/Unix enviorment.


http://wiki.services.openoffice.org/wiki/PyUNO_bridge

so you will use openoffice python api to open/parse documents it supports all openoffice's supported document types


For everone seeking this, I found tika most complete document parsing library. Its not C but its java and Fast enough (when its run inside Nailgun).

tika.apache.org

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜