开发者

println(char), characters turn into Chinese?

Please help me to troubleshoot this problem.

A have an input file 'Trial.txt' with content "Thanh Le".

Here is the function I used in an attempt to read from the file:

    public char[] importSeq(){
    File file = new File("G:\\trial.txt");

    char temp_seq[] = new char[100];

    try{
    FileInputStream fis = new FileInputStream(file);
    BufferedInputStream bis = new BufferedInputStream(fis);
    DataInputStream dis = new DataInputStream(bis);

    int i = 0;

    //Try to read all character till the end of file
    while(dis.available() != 0){
        temp_seq[i]=dis.readChar();
        i++;
    }
    System.out.println(" imported");
    } catch (FileNotFoundException e){
        e.printStackTrace();
    } catch (IOException e){
        e.printStackTrace();
    }

    return temp_seq;
}

And the main function:

public static void main(String[] args) {

    Seq开发者_如何学Cuence s1 = new Sequence();

    char result[];

    result = s1.importSeq();

    int i = 0;
    while(result[i] != 0){
        System.out.println(result[i]);
        i++;
    }
}

And this is the output.

run:

 imported
瑨
慮
栠
汥
BUILD SUCCESSFUL (total time: 0 seconds)


That's honestly said a pretty clumsy way to read a text file into a char[].

Here's a better example, assuming that the text file contains only ASCII characters.

File file = new File("G:/trial.txt");
char[] content = new char[(int) file.length()];
Reader reader = null;

try {
    reader = new FileReader(file);
    reader.read(content);
} finally {
    if (reader != null) try { reader.close(); } catch (IOException ignore) {}
}

return content;

And then to print the char[], just do:

System.out.println(content);

Note that InputStream#available() doesn't necessarily do what you're expecting.

See also:

  • Java IO tutorial


Because in Java a char is made by 2 bytes, so, when you use readChar, it will read pairs of letters and compose them into unicode characters.

You can avoid this by using readByte(..) instead..


Some code to demonstrate, what exactly is happening. A char in Java consists of two bytes and represents one character, the glyph (pixels) you see on the screen. The default encoding in Java is UTF-16, one particular way to use two bytes to represent one of all the glyphs. Your file had one byte to represent one character, probably ASCII. When you read one UTF-16 character, you read two bytes and thus two ASCII characters from your file.

The following code tries to explain how single ASCII bytes 't' and 'h', become one chinese UTF-16 character.

public class Main {
  public static void main(String[] args) {

    System.out.println((int)'t'); // 116 == x74 (116 is 74 in Hex)
    System.out.println((int)'h'); // 104 == x68
    System.out.println((int)'瑨'); // 29800 == x7468

    // System.out.println('\u0074'); // t
    // System.out.println('\u0068'); // h
    // System.out.println('\u7468'); // 瑨

    char th = (('t' << 8) + 'h'); //x74 x68
    System.out.println(th); //瑨 == 29800 == '\u7468'

  }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜