开发者

How to convert documents from .doc to text

I have been pondering writing this question for quite some time.

I work for a small-sized news corporation in Vietnam.

The server I have is running for documents is the latest version of Ubuntu (with PHP/Apache obviously), which means that formats such as .doc and .docx will not be able to be opened natively, as far as I know.

However, when reporters upload documents, half the time they do it in some sort of Microsoft format. This means my Linux machine cannot open and pick out keywords, which is extremely frustrating to me; this is because things like pdf2txt.py do not work.

Is a way to get around this problem, without inconveniencing the reporters too much? I understand that since I am running a Linux server, I may have to run some sort of third-party applica开发者_如何学Pythontion to do the work for me, which could work in the short run, but it could pose some security risks.

Summary: How can I have a Linux server automatically convert any format such as .doc and .docx to PDF for further manipulation?


For oldschool doc files, take a look at catdoc, and wv.

For an all around solution that can convert anything that OpenOffice can open to anything that OpenOffice can save, is unoconv.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜