Character encoding UTF and ISO-8859-1 in CSV [duplicate]

2023-01-29 16:25 问答作者：

This question already has answers here: Closed 10 years ago.

Possible Duplicate:
How to add a UTF-8 BOM in java

My oracle database has a character set of UTF8. I have a Java stored procedure which fetches record from the table and creates a csv file.

BLOB retBLOB = BLOB.createTemporary(conn, true, BLOB.DURATION_SESSION);
retBLOB.open(BLOB.MODE_READWRITE);
OutputStrea开发者_运维百科m bOut = retBLOB.setBinaryStream(0L);
ZipOutputStream zipOut = new ZipOutputStream(bOut);
PrintStream out = new PrintStream(zipOut,false,"UTF-8");

The german characters(fetched from the table) becomes gibberish in the csv if I use the above code. But if I change the encoding to use ISO-8859-1, then I can see the german characters properly in the csv file.

PrintStream out = new PrintStream(zipOut,false,"ISO-8859-1");

I have read in some posts which says that we should use UTF8 as it is safe and will also encode other language (chinese etc) properly which ISO-8859-1 will fail to do so.

Please suggest me which encoding I should use. (There are strong chances that we might have chinese/japanese words stored in the table in the future.)

You're currently only talking about one part of a process that is inherently two-sided.

Encoding something to bytes is only really relevant in the sense that some other process comes along and decodes it back into text at some later point. And of course, both processes need to use the same character set else the decode will fail.

So it sounds to me that the process that takes the BLOB out of the database and into the CSV file, is assuming that the bytes are an ISO-8859-1 encoding of text. Hence if you store them as UTF-8, the decoding messes (though the basic ASCII characters have the same byte representation in both, which is why they still decode correctly).

UTF-8 is a good character set to use in almost all circumstances, but it's not magic enough to overcome the immutable law that the same character set must be used for decoding as was used for encoding. So you can either change your CSV-creator to decode with UTF-8, else you'll have to continue encoding with ISO-8859-1.

I suppose your BLOB data is ISO-8859-1 encoded. As it's stored as binary and not as text its encoding is not depended on the databases encoding. You should check if the the BLOB was originaly written in UTF-8 encoding and if not, do so.

I think the problem is [Excel]csv could not figure out the utf8 encoding. utf-8 csv issue

But I m still not able to resolve the issue even if I put a BOM on the PrintStream.

PrintStream out = new PrintStream(zipOut,false,"UTF-8"); 
out.write('\ufeff');

I also tried:

out.write(new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF });

but to no avail.

继续阅读：character-encoding oracle10g

Character encoding UTF and ISO-8859-1 in CSV [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？