开发者

Reading Windows Unicode files on Android

I just found out that Android can correctly read in a file which is encoded using Windows ANSI (or the so-called multi-byte encoding) and convert it to Java Unicode strings. But it fails when reading a Unicode file. It seems that Android is reading it in a byte-by-byte fashion. A Unicode string "ABC" in the file would be read in to a Java String of length 6, and the char开发者_如何学编程acters are 0x41, 0x00, 0x42, 0x00, 0x43, 0x00.

BufferedReader in = new BufferedReader(new FileReader(pathname));
String str = in.readLine();

Please, is there a way to read Windows Unicode files correctly on Android? Thank you.

[Edited]

Experiements: I saved two Chinese characters "難哪" in two Windows text files:

ANSI.txt -- C3 F8 AD FE
UNICODE.txt -- FF FE E3 96 EA 54

Then I put these files to Emulator's SD card, and I used the following program to read them in: (Notice that the locale of the Emulator has already been set to zh_TW).

BufferedReader in = new BufferedReader(new FileReader("/sdcard/ANSI.txt"));
String szLine = in.readLine();
int n = szLine.length(), j, i;
in.close();
for (i = 0; i < n; i++) 
    j = szLine.charAt(i);

Here is what I saw on the Emulator:

ANSI.txt -- FFFD FFFD FFFD
UNICODE.txt -- FFFD FFFD FFFD FFFD 0084

Apparantly Android (or Java) is unable to properly decode the Chinese characters. So, how do I do this? Thank you in advance.


The FileReader apparently assumes that the encoding will be ASCII-compatible. (Could expect UTF-8 or any of the older ASCII extensions).

Also, it is not a "Unicode file" - it is an "UTF-16 encoded file".

You will have to use a StreamReader and specify the encoding yourself:

BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(pathname), "UTF-16LE"));

You should also really read that article - it seems to me that there is a lot that you misunderstand about character sets and encoding.


You can try following code.
Normally Window base Ascii file that within the chinese words
may not be correct process under android system.

It's normally default to use the UTF8 format in stream Process.

Once you place a Window base Ascii file that within chinese words into Android system.
the normal stream process can't correct recognize the part of chinese.

following code, can correct parser String from Window Base Acsii text file that within chinese words
that put at Android System SD or Asset folder.

It's very simple just Use "BIG5" format decoder , at InputStreamReader Ojbect.

I have been verified. It's working well. Try it !!
FYI. KNC.

String pathname="AAA.txt";
BufferedReader inBR;
inBR = new BufferedReader(new InputStreamReader(new FileInputStream(pathname), "BIG5"));
String sData="";

while ((sData  = inBR.readLine()) != null) {
    System.out.println(sData);
}


A Unicode string "ABC" in the file would be read in to a Java String of length 6, and the characters are 0x41, 0x00, 0x42, 0x00, 0x43, 0x00.

How are you getting the length? What you have described is absolutely correct for a Java String. Java strings are UTF-16 (i.e., Unicode). This means that ABC will be stored in a Java string exactly as you describe (0x41, 0x00, 0x42, 0x00, 0x43, 0x00).

The String 'length', however, as returned by int String.length() will be 3 even though it is 6 bytes long.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜