Converting Html to Word in .Net [closed]
I need to create a word file from a HTML content (on a ASP.NET server application) but couldn't find a robust way of doing that. So decided to run a discussion here to see what are possible options of doing this.
Aspose has a .NET component for this but the price is so high so can not be a solution (due to budgeting issues).
We expect this conversion to preserve tables, images, hiding invisible elements, links, etc.
There is a similar discussion here but solutions provided are all around Office Interop which is not a recommended solution for server application.
Any idea? Basically how do components like Aspose work?
Has the hard work already been done? There seems to be a project on codeplex.
Blog post describing HTML to docx converter
Project on codeplex
I would suggest writing code using the OpenXml API, you can navigate the DOM and programmatically add elements to the word document. Its no simple task through since you are interpretting markup and attempting to convert it.
link for Open XML: http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=5124
It's probably worth checking out Microsoft's own XSLT Inference tool which can generate WordML from XML input.
If you are flexible with the source of the document itself being HTML/XHTML/XML this could easily get the job done.
http://msdn.microsoft.com/en-us/library/aa212886%28v=office.11%29.aspx
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=3412
I've used it in the past to generate Word documents from within an ASP .NET app, which obtained its source XML data from SQL stored procedures.
The tool can be a bit temperamental, but with a little sanitisating of the XSLT that it generates it could just work.
If docx is appliable you can create a word document, save it as docx, reverse engineer the xml and create your own xml/docx. I did it with excel/xslx and it worked perfectly. To speed things up we created the XML as text and joined the strings (before our data - our data - after our data).
The RTF format is not a standard afaik but it is wide spread. Create an RTF document and return it as a word document. Word opens rtf without problem.
Create a HTML document and return it as a word document.
HTH
精彩评论