Are there any Libraries/ Projects that convert any generic document type to HTML
Are there any projects out there trying to build converters for different file types -> HTML or Text. The document formats are the most common ones; they include PDF, DOC(X), XLS(X), PPT(X), PS, etc. I am already aware of some Unix utilities like pdftotext. Also, I know of Apache's Tika and POI projects. Is there anything that has a generic interface ? Something like the MultiMarkdow开发者_运维问答n
Like you said, the philosophy of UNIX-like systems is to use small utilities/filters to do that (latex2html, t2html, txt2html, pdftohtml, etc.). You could create you own interface using shell scripting, perl, python, etc. and use those filters as callbacks.
精彩评论