How to distinguish UTF-8 and ASCII files?
How to distinguish UTF-8 (no BOM) a开发者_StackOverflow中文版nd ASCII files?
If the file contains any bytes with the top bit set, then it is not ASCII.
So if the only possibilities are ASCII or UTF-8, then it's UTF-8.
If the file contains only bytes with the top bit clear, then it's meaningless to distinguish whether it's ASCII or UTF-8, since it represents exactly the same series of characters either way. But you can call it ASCII.
Of course this doesn't distinguish UTF-8 from ISO Latin or CP1252, and neither does it confirm that the so-called UTF-8 is actually valid.
http://msdn.microsoft.com/en-us/library/dd318672%28v=vs.85%29.aspx
IsTextUnicode Function Determines if a buffer is likely to contain a form of Unicode text.
精彩评论