开发者

how to get a word count on word document in python?

I am try开发者_如何学Pythoning to get the word counts of .doc .docx .odt and .pdf type files. This is pretty simple for .txt files but how can I go about doing a word count on the mentioned types?

I'm using python django on Ubuntu and trying to word count the documents words when a user uploads a file through the system.


First you need to read your .doc .docx .odt and .pdf.

Second, count the words (<2.7 version).


Given that you can do this for .txt files I'll assume that you know how to count the words, and that you just need to know how to read the various file types. Take a look at these libraries:

PDF: pypdf

doc/docx: this question, python-docx

odt: examples here

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜