Copying HTML to Word via VBA and clipboard loses special characters
I would like to paste some HTML-formated data to Word via VBA. HTML Data are obtained from MS XML by transforming xml document by given xsl into proper html and this transformed html data I want to put to Word preserving HTML formating. I found that only way to get HTML data to Word is to put them into clipboard. Im using this functions for that:
http://support.microsoft.com/kb/274326 And then using PasteSpecial Im puting that to Word. In general it works but... The problem is with special characters (in my case Polish diacritic letters) which are completly malformed. According to http://msdn.microsoft.com/en-us/library/ms649015%28v=vs.85%29.aspx HTML format in clipboard uses UTF-8 for encoding and in my XML I also uses UTF-8 so in theory everything should be fine, but it isnt. I also tried to find some function for converting ASCII to UTF (in case if for some reason my string is in ASCII) and vice versa (in case clipboard uses ASCII besides what MS writes on MSDN) but without success. For example using StrConv(html, vbUnicode) from VBA malformed all HTML tags and still didnt print polish chars well.I get the html data in such way:
Dim xslt As New MSXML2.DOMDocument
xslt.Load (xsltfile)
Dim xmlDoc As New MSXML2.DOMDocument
xmlDoc.load(xmlfile)
html = xmlDoc.transformNode(xslt)
and then pasting to the Word (using functions given above)
PutHTMLClipboard html, "", ""
where.Paste
ClearClipboard
EDIT: Probably the text get by xmlDoc.transformNode IS ANCII. Does any one know some better function to convert ASCII to UTF? Built in StrConv(html, vbUnicode) does not work well...
EDIT: after research Im sure - The string I've got from transformNode is ASCII (as in开发者_StackOverflow CF_TEXT clipboard format) and CF_HTML needs UTF-8 encoding. How could I transform this string to UTF-8? Build-in StrConv(string,vUnicode) does not works...
You can use
Sub OpenHtml()
'officevb.com
Dim wd As Word.Application
Dim doc As Word.Document
Set wd = Application
Set doc = wd.Documents.Open("http://www.google.com.br")
doc.SaveAs "G:\page.docx", wdFormatDocument
End Sub
This way you don't need copy the content.
[]'s
精彩评论