开发者

Called ReadFile on a text file, got weird (Japanese?) characters

I use the next code to read all of the elemnts from a file with the handle hFile that works, and with its size that I got with GetFileSize(hFile, NULL).

_TCHAR* text = (_TCHAR*)malloc(sizeOfFile * sizeof(_TCHAR));
DWORD numRead = 0;
BOOL didntFail = ReadFile(hFile, text, sizeOfFile, &numRead, NULL);

after the operation text is some strange thing in Japanese or som开发者_如何学Goething, and not the content of the file.

what did i do wrong?

edit: I understand it is the encoding problem, but then how will I convert text to LPCWSTR to use stuff like WriteConsoleOutputCharacter


Modern IDEs default to Unicode applications, meaning _TCHAR is actually wchar_t. ReadFile() works with simple bytes and if you use it to fill a _TCHAR array directly, you'll get 8-bit characters interpreted as UTF-16 Unicode. These usually show as CJK (Chinese/Japanese/Korean) glyphs.

You have three options:

  • convert your program to non-Unicode
  • use a file containing Unicode text (in UTF-16 encoding), or
  • read from the file into a char array and then use MultiByteToWideChar() to convert the text to Unicode.

If you mix Unicode and non-Unicode be careful to calculate the correct buffer sizes (number of bytes vs. number of characters).

Note that you can still use narrow chars with Windows in your Unicode program if you call the ANSI version of the Windows function (e.g. WriteConsoleOutputCharacterA).


You got the type of the string wrong. Text from a file that was encoded in an 8-bit encoding will look like Chinese when you look at it through a character type, like TCHAR with UNICODE defined, that uses a 16-bit encoding. Fix:

 char* text = (char*)malloc(...);

You do normally have to fret a lot more about the encoding that was used to write the text. It could be utf-8 for example. You can convert from the 8-bit encoding to a TCHAR (wchar_t, really) with MultiByteToWideChar(). Its first argument is the one to fret about.


You have read an ANSI or UTF-8 text file into a UTF-16 string.


wchar_t ReadBuff[1024];
memset(&ReadBuff, 0, sizeof(ReadBuff));

HANDLE hFile = CreateFile(szPathFileName, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD NumberOfBytesRead = 0;

ReadFile(hFile, ReadBuff, 600, &NumberOfBytesRead, NULL);

wsprintf(ReadBuff, L"%S\0", ReadBuff);

ReadBuff is now in readable form.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜