开发者

How to distinguish UTF-8 and ASCII files?

How to distinguish UTF-8 (no BOM) a开发者_StackOverflow中文版nd ASCII files?


If the file contains any bytes with the top bit set, then it is not ASCII.

So if the only possibilities are ASCII or UTF-8, then it's UTF-8.

If the file contains only bytes with the top bit clear, then it's meaningless to distinguish whether it's ASCII or UTF-8, since it represents exactly the same series of characters either way. But you can call it ASCII.

Of course this doesn't distinguish UTF-8 from ISO Latin or CP1252, and neither does it confirm that the so-called UTF-8 is actually valid.


http://msdn.microsoft.com/en-us/library/dd318672%28v=vs.85%29.aspx

IsTextUnicode Function Determines if a buffer is likely to contain a form of Unicode text.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜