How to check if a .txt file is in ASCII or UTF-8 format in Windows environment?
I have converted a .txt file from ASCII to UTF-8 using 开发者_如何学CUltraEdit. However, I am not sure how to verify if it is in UTF-8 format in Windows environment.
Thank you!
Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format.
Text files in Windows don't have a format. There's an unofficial convention that if the file starts with the BOM codepoint in UTF-8 format that it's UTF-8, but that convention isn't universally supported. That would be the 3 byte sequence "\xef\xbf\xbe"
, i.e. ￾
in the Latin-1 character set.
Open the file using Notepad++ and check the "Encoding" menu, you can check the current Encoding and/or Convert to a set of encodings available.
Open it in a hex editor and make sure that the first three bytes are a UTF8 BOM (EF BB BF
)
If you use Windows 10 and has Windows Subsystem for Linux (WSL), it can be easily done by typing "file " from the shell.
For example:
$ file code.cpp
code.cpp: C source, UTF-8 Unicode (with BOM) text, with CRLF line terminators
I had a directory of files that I wanted to check. I created an Excel macro to determine ANSI vs. UTF-8. This worked for me.
Sub GetTextFileEncoding()
Dim sFile As String
Dim sPath As String
Dim sTextLine As String
Dim iRow As Integer
'Set Defaults and Initial Values
iRow = 1
sPath = "C:textfiles\"
sFile = Dir(sPath & "*.txt")
Do While Len(sFile) > 0
'Get FileType
'Debug.Print sFile & " - " & FileEncodeType(sPath & sFile)
'Show on Excel Worksheet
Cells(iRow, 1).Value = sFile
Cells(iRow, 2).Value = FileEncodeType(sPath & sFile)
'Get next file
sFile = Dir
'Increment Row
iRow = iRow + 1
Loop
End Sub
Function FileEncodeType(sFile As String) As String
Dim bEF As Boolean
Dim bBB As Boolean
Dim bBF As Boolean
bEF = False
bBB = False
bBF = False
Open sFile For Input As #1
If Not EOF(1) Then
'Read first line
Line Input #1, textline
'Debug.Print textline
For i = 1 To 3
'Debug.Print Asc(Mid(textline, i, 1)) & " - " & Mid(textline, i, 1)
Select Case i
Case 1
If Asc(Mid(textline, i, 1)) = 239 Then
bEF = True
End If
Case 2
If Asc(Mid(textline, i, 1)) = 187 Then
bBB = True
End If
Case 3
If Asc(Mid(textline, i, 1)) = 191 Then
bBF = True
End If
Case 4
End Select
Next
End If
Close #1
If bEF And bBB And bBF Then
FileEncodeType = "UTF-8"
Else
FileEncodeType = "ANSI"
End If
End Function
精彩评论