Character encoding UTF and ISO-8859-1 in CSV [duplicate]
Possible Duplicate:
How to add a UTF-8 BOM in java
My oracle database has a character set of UTF8. I have a Java stored procedure which fetches record from the table and creates a csv file.
BLOB retBLOB = BLOB.createTemporary(conn, true, BLOB.DURATION_SESSION);
retBLOB.open(BLOB.MODE_READWRITE);
OutputStrea开发者_运维百科m bOut = retBLOB.setBinaryStream(0L);
ZipOutputStream zipOut = new ZipOutputStream(bOut);
PrintStream out = new PrintStream(zipOut,false,"UTF-8");
The german characters(fetched from the table) becomes gibberish in the csv if I use the above code. But if I change the encoding to use ISO-8859-1
, then I can see the german characters properly in the csv file.
PrintStream out = new PrintStream(zipOut,false,"ISO-8859-1");
I have read in some posts which says that we should use UTF8 as it is safe and will also encode other language (chinese etc) properly which ISO-8859-1
will fail to do so.
Please suggest me which encoding I should use. (There are strong chances that we might have chinese/japanese words stored in the table in the future.)
You're currently only talking about one part of a process that is inherently two-sided.
Encoding something to bytes is only really relevant in the sense that some other process comes along and decodes it back into text at some later point. And of course, both processes need to use the same character set else the decode will fail.
So it sounds to me that the process that takes the BLOB out of the database and into the CSV file, is assuming that the bytes are an ISO-8859-1 encoding of text. Hence if you store them as UTF-8, the decoding messes (though the basic ASCII characters have the same byte representation in both, which is why they still decode correctly).
UTF-8 is a good character set to use in almost all circumstances, but it's not magic enough to overcome the immutable law that the same character set must be used for decoding as was used for encoding. So you can either change your CSV-creator to decode with UTF-8, else you'll have to continue encoding with ISO-8859-1.
I suppose your BLOB data is ISO-8859-1 encoded. As it's stored as binary and not as text its encoding is not depended on the databases encoding. You should check if the the BLOB was originaly written in UTF-8 encoding and if not, do so.
I think the problem is [Excel]csv could not figure out the utf8 encoding. utf-8 csv issue
But I m still not able to resolve the issue even if I put a BOM on the PrintStream.
PrintStream out = new PrintStream(zipOut,false,"UTF-8");
out.write('\ufeff');
I also tried:
out.write(new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF });
but to no avail.
精彩评论