Save text file UTF-8 encoded with VBA
how can I write UTF-8 encoded strings to a textfile from vba, like
Dim fnum As Integer
fnum = Fr开发者_JAVA百科eeFile
Open "myfile.txt" For Output As fnum
Print #fnum, "special characters: äöüß" 'latin-1 or something by default
Close fnum
Is there some setting on Application level?
I found the answer on the web:
Dim fsT As Object
Set fsT = CreateObject("ADODB.Stream")
fsT.Type = 2 'Specify stream type - we want To save text/string data.
fsT.Charset = "utf-8" 'Specify charset For the source text data.
fsT.Open 'Open the stream And write binary data To the object
fsT.WriteText "special characters: äöüß"
fsT.SaveToFile sFileName, 2 'Save binary data To disk
Certainly not as I expected...
You can use CreateTextFile or OpenTextFile method, both have an attribute "unicode" useful for encoding settings.
object.CreateTextFile(filename[, overwrite[, unicode]])
object.OpenTextFile(filename[, iomode[, create[, format]]])
Example: Overwrite:
CreateTextFile:
fileName = "filename"
Set fso = CreateObject("Scripting.FileSystemObject")
Set out = fso.CreateTextFile(fileName, True, True)
out.WriteLine ("Hello world!")
...
out.close
Example: Append:
OpenTextFile Set fso = CreateObject("Scripting.FileSystemObject")
Set out = fso.OpenTextFile("filename", ForAppending, True, 1)
out.Write "Hello world!"
...
out.Close
See more on MSDN docs
This writes a Byte Order Mark at the start of the file, which is unnecessary in a UTF-8 file and some applications (in my case, SAP) don't like it. Solution here: Can I export excel data with UTF-8 without BOM?
Here is another way to do this - using the API function WideCharToMultiByte:
Option Explicit
Private Declare Function WideCharToMultiByte Lib "kernel32.dll" ( _
ByVal CodePage As Long, _
ByVal dwFlags As Long, _
ByVal lpWideCharStr As Long, _
ByVal cchWideChar As Long, _
ByVal lpMultiByteStr As Long, _
ByVal cbMultiByte As Long, _
ByVal lpDefaultChar As Long, _
ByVal lpUsedDefaultChar As Long) As Long
Private Sub getUtf8(ByRef s As String, ByRef b() As Byte)
Const CP_UTF8 As Long = 65001
Dim len_s As Long
Dim ptr_s As Long
Dim size As Long
Erase b
len_s = Len(s)
If len_s = 0 Then _
Err.Raise 30030, , "Len(WideChars) = 0"
ptr_s = StrPtr(s)
size = WideCharToMultiByte(CP_UTF8, 0, ptr_s, len_s, 0, 0, 0, 0)
If size = 0 Then _
Err.Raise 30030, , "WideCharToMultiByte() = 0"
ReDim b(0 To size - 1)
If WideCharToMultiByte(CP_UTF8, 0, ptr_s, len_s, VarPtr(b(0)), size, 0, 0) = 0 Then _
Err.Raise 30030, , "WideCharToMultiByte(" & Format$(size) & ") = 0"
End Sub
Public Sub writeUtf()
Dim file As Integer
Dim s As String
Dim b() As Byte
s = "äöüßµ@€|~{}[]²³\ .." & _
" OMEGA" & ChrW$(937) & ", SIGMA" & ChrW$(931) & _
", alpha" & ChrW$(945) & ", beta" & ChrW$(946) & ", pi" & ChrW$(960) & vbCrLf
file = FreeFile
Open "C:\Temp\TestUtf8.txt" For Binary Access Write Lock Read Write As #file
getUtf8 s, b
Put #file, , b
Close #file
End Sub
I looked into the answer from Máťa whose name hints at encoding qualifications and experience. The VBA docs say CreateTextFile(filename, [overwrite [, unicode]])
creates a file "as a Unicode or ASCII file. The value is True if the file is created as a Unicode file; False if it's created as an ASCII file. If omitted, an ASCII file is assumed." It's fine that a file stores unicode characters, but in what encoding? Unencoded unicode can't be represented in a file.
The VBA doc page for OpenTextFile(filename[, iomode[, create[, format]]])
offers a third option for the format:
- TriStateDefault 2 "opens the file using the system default."
- TriStateTrue 1 "opens the file as Unicode."
- TriStateFalse 0 "opens the file as ASCII."
Máťa passes -1 for this argument.
Judging from VB.NET documentation (not VBA but I think reflects realities about how underlying Windows OS represents unicode strings and echoes up into MS Office, I don't know) the system default is an encoding using 1 byte/unicode character using an ANSI code page for the locale. UnicodeEncoding
is UTF-16. The docs also describe UTF-8 is also a "Unicode encoding," which makes sense to me. But I don't yet know how to specify UTF-8 for VBA output nor be confident that the data I write to disk with the OpenTextFile(,,,1) is UTF-16 encoded. Tamalek's post is helpful.
I didn't want to change all my code just to support several UTF8 strings so i let my code do it's thing, and after the file was saved (in ANSI code as it is the default of excel) i then convert the file to UTF-8 using this code:
Sub convertTxttoUTF(sInFilePath As String, sOutFilePath As String)
Dim objFS As Object
Dim iFile As Double
Dim sFileData As String
'Init
iFile = FreeFile
Open sInFilePath For Input As #iFile
sFileData = Input$(LOF(iFile), iFile)
sFileData = sFileData & vbCrLf
Close iFile
'Open & Write
Set objFS = CreateObject("ADODB.Stream")
objFS.Charset = "utf-8"
objFS.Open
objFS.WriteText sFileData
'Save & Close
objFS.SaveToFile sOutFilePath, 2 '2: Create Or Update
objFS.Close
'Completed
Application.StatusBar = "Completed"
End Sub
and i use this sub like this (this is an example):
Call convertTxttoUTF("c:\my.json", "c:\my-UTF8.json")
i found this code here: VBA to Change File Encoding ANSI to UTF8 – Text to Unicode
and since this is written with BOM marker, in order to remove the bom i changed the Sub to this:
Sub convertTxttoUTF(sInFilePath As String, sOutFilePath As String)
Dim objStreamUTF8 As Object
Dim objStreamUTF8NoBOM As Object
Dim iFile As Double
Dim sFileData As String
Const adSaveCreateOverWrite = 2
Const adTypeBinary = 1
Const adTypeText = 2
'Init
iFile = FreeFile
Open sInFilePath For Input As #iFile
sFileData = Input(LOF(iFile), iFile)
Close iFile
'Open files
Set objStreamUTF8 = CreateObject("ADODB.Stream")
Set objStreamUTF8NoBOM = CreateObject("ADODB.Stream")
' wrute the fules
With objStreamUTF8
.Charset = "UTF-8"
.Open
.WriteText sFileData
.Position = 0
.SaveToFile sOutFilePath, adSaveCreateOverWrite
.Type = adTypeText
.Position = 3
End With
With objStreamUTF8NoBOM
.Type = adTypeBinary
.Open
objStreamUTF8.CopyTo objStreamUTF8NoBOM
.SaveToFile sOutFilePath, 2
End With
' close the files
objStreamUTF8.Close
objStreamUTF8NoBOM.Close
End Sub
i used this answer to solve the BOM unknown character at the beginning of the file
The traditional way to transform a string to a UTF-8 string is as follows:
StrConv("hello world",vbFromUnicode)
So put simply:
Dim fnum As Integer
fnum = FreeFile
Open "myfile.txt" For Output As fnum
Print #fnum, StrConv("special characters: äöüß", vbFromUnicode)
Close fnum
No special COM objects required
精彩评论