How to count words characters or sentence from uploaded file PDF, Doc, Xls, Csv, etc etc
How to count words from a uploaded file in PDF, Doc, Xls, Csv, etc etc. Either开发者_如何学Go using PHP, Zend Framework or CLI based Java trigger ?
Here's a third party app that does it http://www.globalrendering.com/download.html. You could create a simple wrapper for it. As far as wc, its not accurate for those file types. See http://ubuntuforums.org/showthread.php?t=566407
First of all, you should have a look at tika which is written in Java, is free (Apache licensed) and can convert all formats you mentioned to text. After that, word count should be trivial.
You could also use linux command line utilities for converting to text, and write a simple wrapper around them.
(I cannot link to these for lack of reputation. Use your Google-fu.)
- pdf: pdftotext (part of xpdf). see also question #221359 on SuperUser.
- doc(x): abiword, catdoc, antiword, docxtotxt ... see also question 165978 on SuperUser.
- xls (and pretty much everything, but needs OpenOffice): unoconv
精彩评论