开发者

reading unicode *.txt files?

Currently I am reading .txt files with

    FileInputStream is = new FileInputStream开发者_开发问答(masterPath+txt);
    BufferedReader br = new BufferedReader(new InputStreamReader(is));

    String readLine = null;

        while ((readLine = br.readLine()) != null) 
        {
        ...

But unicode characters do not appear as they should.

Any ideas how to change the above code, for unicode to work?

Thanks!


Yes. Specify the appropriate encoding when constructing your InputStreamReader. If your file is UTF-8 encoded, use

new BufferedReader(new InputStreamReader(is, "UTF-8"));


The plain InputStreamReader constructor will assume that the file has the system's "default encoding". Because it is rather unpredictable what that is, this constructor should not be used except in toy examples. Use one of the two-argument constructors that allow you to specify the encoding explicitly.

By the way, "Unicode" is not sufficient to tell what is in the file you want to read. Unicode, by and of itself, defines just how numbers ("codepoints") are assigned to characters, not how to pack those numbers into bytes in a file, which is the job of an "encoding". In practice your encoding is likely to be either UTF-8 or UTF-16 or some endianness.


Maybe your file isn't unicode encoded, or maybe the way you're displaying it isn't unicode-compliant (Windows cmd.exe, I'm looking at you).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜