how to test if a string is a valid UTF16 string?
I am using mongodb and javascript to do some string processing. Now I got some开发者_C百科 error like:
Sun May 23 07:42:20 Assertion failure JS_EncodeCharacters( _context , s , srclen , dst , &len) scripting/engine_spidermonkey.cpp 152 0x80f4f7e 0x80f8794 0x811525b 0x811a953 0x8119fc4 0x8111bc5 0x81b408e 0x81c4ee7 0x81b4a10 0x817a881 0x817a7d8 0x817a6e2 0x811e1bb 0x80a777b 0x80a8f8a 0xb7cb2455 0x80a37a1 mongodb-linux-i686-1.4.2/bin/mongo(_ZN5mongo12sayDbContextEPKc+0xfe) [0x80f4f7e]
After doing some google, I find that JS_EncodeCharacters return false if the input is not a valid UTF16 string. (if spidermonkey is build with UTF-8 enabled)
So I was wondering how to test if the input string if a proper UTF16 string? so I can skip such kind of string to avoid problem ...
Thanks
This part of the UTF-16 FAQ describes the sequences of invalid characters:
The two values FFFE16 and FFFF16 as well as the 32 values from FDD016 to FDEF16 represent noncharacters. They are invalid in interchange, but may be freely used internal to an implementation. Unpaired surrogates are invalid as well, i.e. any value in the range D80016 to DBFF16 not followed by a value in the range DC0016 to DFFF16, or any value in the range DC0016 to DFFF16 not preceded by a value in the range D80016 to DBFF16.
If you're doing this in Javascript, I'm not sure it'll be all that easy to test for this, though...
精彩评论