how to do post-formatting for DOC/DOCX conversion to HTML?
I am currently using OpenOffice (command-line) and JODConvertor to convert Word Documents (both .doc and .docx) to HTML for a web application I'm hosting. It works great except for one problem--the HTML files are not formatted properly in terms of the margins. Even worse, the margins are inconsistent across operating systems (MacOS & Windows) and browsers.
Is there another tool out there that does the post-formatting (I think it involves re-writing the CSS of the converted HTML document), much like Google Docs?
I'm not trying to be another Google Docs, I just want to imitate their post-formatting process (more specifically, the margin width formatting) only, so I can have users upload and store HTML docs on my开发者_如何学Python own service. I need it to be an automated process independent of any third party sites (I'm aware that Google has an API, called googlecl, but it requires authentication, and you become dependent on their servers and services; not to mention you have a quota).
If anyone knows of any other method other than the OpenOffice route, I'm open to suggestions.
It seems your best bet would be to add a feature to JODConverter that allows you to insert your own CSS during the export. Something like the following for all pages:
body {
margin: 50px !important;
}
Either persuade the maintainer of JODConverter, or grab the code and hack it together yourself. Best of luck.
精彩评论