ICU Byte Order Mark (BOM)
I am using ICU's ustdio functions to write a UnicodeString object to a file in a range of encodings, however it doesn't appear to prepend the BOM.
My Code:
void write_file(const char* filename, UnicodeString &str) {
UFILE* f = u_fopen(filename, "w", NULL, "UTF-16 LE");
u_file_write(str.getTerminatedBuffer(), str.length() +开发者_如何学运维 1, f);
u_fclose(f);
}
int _tmain(int argc, _TCHAR* argv[])
{
UnicodeString str(L"ΠαρθένωνΗ");
write_file("test.txt", str);
return 0;
}
The file encoding does swap when I change LE to BE, however there is no BOM, the output file in a hex editor is:
A0 03 B1 03 C1 03 B8 03 AD 03 BD 03 C9 03 BD 03 97 03 00 00
NOTE: If I set the codepage as "UTF-16", there is a BOM, however once I manually specify the endianness it disappears.
Alternatively is there a way I could write the UnicodeString to a file with a BOM?
Just guessing, the "UTF-16 LE" and "UTF-16 BE" were intended to be used when the byte order was well specified and the BOM would not be necessary in the context where the file would be used.
You should be able to write your own BOM character '\ufeff'
to the file.
u_fputc(0x00feff,f);
will do it.
精彩评论