how to get a word count on word document in python?
I am try开发者_如何学Pythoning to get the word counts of .doc .docx .odt and .pdf type files. This is pretty simple for .txt files but how can I go about doing a word count on the mentioned types?
I'm using python django on Ubuntu and trying to word count the documents words when a user uploads a file through the system.
First you need to read your .doc .docx .odt and .pdf.
Second, count the words (<2.7 version).
Given that you can do this for .txt files I'll assume that you know how to count the words, and that you just need to know how to read the various file types. Take a look at these libraries:
PDF: pypdf
doc/docx: this question, python-docx
odt: examples here
精彩评论