MultiByteToWideChar API changes on Vista
I want an option to convert a string to wide string with two different behaviors:
- Ignore illegal characters
- Abort conversion if illegal character occurs:
On Windows XP I could do this:
bool ignore_illegal; // input
DWORD flags = ignore_illegal ? 0 : MB_ERR_INVALID_CHARS;
SetLastError(0);
int res = MultiByteToWideChar(CP_UTF8,flags,"test\xFF\xFF test",-1,buf,sizeof(buf));
int err = GetLastError();
std::cout << "result = " << res << " get last error = " << err;
Now, on XP if ignore illegal is true characters I would get:
result = 10 get last error = 0
And in case of ignore illegal is false I get
result = 0 get l开发者_运维百科ast error = 1113 // invalid code
So, given big enough buffer it is enough to check result != 0 ;
According to documentation http://msdn.microsoft.com/en-us/library/dd319072(VS.85).aspx there are API changes, so how does this changes on Vista?
I think what it does is replacing illegal code units by the replacement character (U+FFFD), as mandated by the Unicode standard. The following code
#define STRICT
#define UNICODE
#define NOMINMAX
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <cstdlib>
#include <iostream>
#include <iomanip>
void test(bool ignore_illegal) {
const DWORD flags = ignore_illegal ? 0 : MB_ERR_INVALID_CHARS;
WCHAR buf[0x100];
SetLastError(0);
const int res = MultiByteToWideChar(CP_UTF8, flags, "test\xFF\xFF test", -1, buf, sizeof buf);
const DWORD err = GetLastError();
std::cout << "ignore_illegal = " << std::boolalpha << ignore_illegal
<< ", result = " << std::dec << res
<< ", last error = " << err
<< ", fifth code unit = " << std::hex << static_cast<unsigned int>(buf[5])
<< std::endl;
}
int main() {
test(false);
test(true);
std::system("pause");
}
produces the following output on my Windows 7 system:
ignore_illegal = false, result = 0, last error = 1113, fifth code unit = fffd
ignore_illegal = true, result = 12, last error = 0, fifth code unit = fffd
So the error codes stay the same, but the length is off by two, indicating the two replacement code points that have been inserted. If you run my code on XP, the fifth code point should be U+0020 (the space character) if the two illegal code units have been dropped.
WCHAR *pstrRet = NULL;
int nLen = MultiByteToWideChar(CP_UTF8, 0, pstrTemp2, -1, NULL, 0);
pstrRet = new WCHAR[nLen];
int nConv = MultiByteToWideChar(CP_UTF8, 0, pstrTemp2, -1, pstrRet, nLen);
if (nConv == nLen)
{
// Success! pstrRet should be the wide char equivelant of pstrTemp2
}
if (pstrRet)
delete[] pstrRet;
I think this is way it is used it on vista found on some forum :)
精彩评论