How to deal with UTF-16LE encoded text file using Java? or convert it to ASCII?
I am sorry if it has been asked before. I am trying to process a text file using Java. The text file is exported from MS SQLServer. When I open it in PSPad (sort of text editor in which I can view any file in hex format), it tells me that my text file is in UTF-16LE
. Since I am getting it from someone else, it is quite possible.
Now my Java program is not able to deal with that format. So I wanted to know if there is any way by which I can either convert my text file in ASCII
format or do some preprocessing or anything? I CAN modify the file.
Any help is greatly appreciated.
Thanks.
EDIT 1
I wrote this program, but it is not working as expected. If I see the output file in PSPad, I can see each character as a 2-byte char, e.g. '2' is 3200 instead of just 32; 'M' is 4D00 instead of just 4D, etc. The though says the encoding of output file is UTF-8. I am kind of confused here. Can anyone tell me what am I doing wrong?
public static void main(String[] args) throws Exception {
try {
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream(
"input.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in,"UTF-16LE"));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Write to the file
writeToFile(strLine);
}
// Close the input stream
in.close();
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
System.out.println("done.");
}
static public void writeToFile(String str) {
try {
OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream("output.txt", true), "UTF-8");
BufferedWriter fbw = new BufferedWriter(writer);
fbw.write(str);
fbw.close();
} catch (Exception e) {// Catch exception if any
System.err.println("Er开发者_C百科ror: " + e.getMessage());
}
}
EDIT 2
Here are the snapshots:
input file in PSPad (a free hex viewer)
output file in PSPad
this is what i was expecting to see:
Create an InputStreamReader for charset UTF-16LE and you will be all set.
InputStreamReader will let you load your UTF-16EL in memory. You can then perform all string manipulations you need. Then, you can save into ASCII format using OutputStreamWriter. Use CharSet to select formats.
Just found a solution.
http://www.fileformat.info/convert/text/utf2utf.htm
Lets you upload and convert between the encodings.
Its not a permanent solution though, since my file is 700MB+. So I will try out some solutions posted by others.
This small software helps:
http://www.kalytta.com/tools.php
精彩评论