开发者

ICU Byte Order Mark (BOM)

I am using ICU's ustdio functions to write a UnicodeString object to a file in a range of encodings, however it doesn't appear to prepend the BOM.

My Code:

void write_file(const char* filename, UnicodeString &str) {

    UFILE* f = u_fopen(filename, "w", NULL, "UTF-16 LE");
    u_file_write(str.getTerminatedBuffer(), str.length() +开发者_如何学运维 1, f);
    u_fclose(f);
}

int _tmain(int argc, _TCHAR* argv[])
{
    UnicodeString str(L"ΠαρθένωνΗ");

    write_file("test.txt", str);

    return 0;
}

The file encoding does swap when I change LE to BE, however there is no BOM, the output file in a hex editor is:

A0 03 B1 03  C1 03 B8 03  AD 03 BD 03  C9 03 BD 03  97 03 00 00

NOTE: If I set the codepage as "UTF-16", there is a BOM, however once I manually specify the endianness it disappears.

Alternatively is there a way I could write the UnicodeString to a file with a BOM?


Just guessing, the "UTF-16 LE" and "UTF-16 BE" were intended to be used when the byte order was well specified and the BOM would not be necessary in the context where the file would be used.

You should be able to write your own BOM character '\ufeff' to the file.


u_fputc(0x00feff,f);

will do it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜