reading unicode *.txt files?
Currently I am reading .txt files with
FileInputStream is = new FileInputStream开发者_开发问答(masterPath+txt);
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String readLine = null;
while ((readLine = br.readLine()) != null)
{
...
But unicode characters do not appear as they should.
Any ideas how to change the above code, for unicode to work?
Thanks!
Yes. Specify the appropriate encoding when constructing your InputStreamReader. If your file is UTF-8 encoded, use
new BufferedReader(new InputStreamReader(is, "UTF-8"));
The plain InputStreamReader
constructor will assume that the file has the system's "default encoding". Because it is rather unpredictable what that is, this constructor should not be used except in toy examples. Use one of the two-argument constructors that allow you to specify the encoding explicitly.
By the way, "Unicode" is not sufficient to tell what is in the file you want to read. Unicode, by and of itself, defines just how numbers ("codepoints") are assigned to characters, not how to pack those numbers into bytes in a file, which is the job of an "encoding". In practice your encoding is likely to be either UTF-8 or UTF-16 or some endianness.
Maybe your file isn't unicode encoded, or maybe the way you're displaying it isn't unicode-compliant (Windows cmd.exe, I'm looking at you).
精彩评论