开发者

How to add a UTF-8 BOM in Java?

I have a Java stored procedure which fetches record from the table using Resultset object and creates a CS Vfile.

BLOB retBLOB = BLOB.createTemporary(conn, true, BLOB.DURATION_SESSION);
retBLOB.open(BLOB.MODE_READWRITE);
OutputStream bOut = retBLOB.setBinaryStream(0L);

ZipOutputStream zipOut = new ZipOutputStream(bOut);
PrintStream out = new PrintStream(zipOut,false,"UTF-8");
out.write('\ufeff');
out.flush();

zipOut.putNextEntry(new ZipEntry("filename.csv"));
while (rs.next()){
    out.print("\"" + rs.getString(i) + "\"");
    out.print(",");
}
out.flush();

zipOut.closeEntry();
zipOut.close();
retBLOB.close();

return retBLOB;

But the generated CSV file doesn't show the correct German character. Oracle data开发者_高级运维base also has a NLS_CHARACTERSET value of UTF8.

Please suggest.


BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(...), StandardCharsets.UTF_8));
out.write('\ufeff');
out.write(...);

This correctly writes out 0xEF 0xBB 0xBF to the file, which is the UTF-8 representation of the BOM.


Just in case people are using PrintStreams, you need to do it a little differently. While a Writer will do some magic to convert a single byte into 3 bytes, a PrintStream requires all 3 bytes of the UTF-8 BOM individually:

    // Print utf-8 BOM
    PrintStream out = System.out;
    out.write('\ufeef'); // emits 0xef
    out.write('\ufebb'); // emits 0xbb
    out.write('\ufebf'); // emits 0xbf

Alternatively, you can use the hex values for those directly:

    PrintStream out = System.out;
    out.write(0xef); // emits 0xef
    out.write(0xbb); // emits 0xbb
    out.write(0xbf); // emits 0xbf


To write a BOM in UTF-8 you need PrintStream.print(), not PrintStream.write().

Also if you want to have BOM in your csv file, I guess you need to print a BOM after putNextEntry().


PrintStream#print

I think that out.write('\ufeff'); should actually be out.print('\ufeff');, calling the java.io.PrintStream#print method.

According the javadoc, the write(int) method actually writes a byte ... without any character encoding. So out.write('\ufeff'); writes the byte 0xff. By contrast, the print(char) method encodes the character as one or bytes using the stream's encoding, and then writes those bytes.

As noted in section 23.8 of the Unicode 9 specification, the BOM for UTF-8 is EF BB BF. That sequence is what you get when using UTF-8 encoding on '\ufeff'. See: Why UTF-8 BOM bytes efbbbf can be replaced by \ufeff?.


You Add This For First Of CSV String

String CSV = "";
byte[] BOM = {(byte) 0xEF,(byte) 0xBB,(byte) 0xBF};
CSV = new String(BOM) + CSV;

This Work For Me.


If you just want to

modify the same file

(without new file and delete old one as I had issues with that)

private void addBOM(File fileInput) throws IOException {
    try (RandomAccessFile file = new RandomAccessFile(fileInput, "rws")) {
        byte[] text = new byte[(int) file.length()];
        file.readFully(text);
        file.seek(0);
        byte[] bom = { (byte) 0xEF, (byte) 0xBB, (byte) 0xBF };
        file.write(bom);
        file.write(text);
    }
}


In my case it works with the code:

PrintWriter out = new PrintWriter(new File(filePath), "UTF-8");
out.write(csvContent);
out.flush();
out.close();


Here a simple way to append BOM header on any file :

private static void appendBOM(File file) throws Exception {
    File bomFile = new File(file + ".bom");
    try (FileOutputStream output = new FileOutputStream(bomFile, true)) {
        byte[] bytes = FileUtils.readFileToByteArray(file);
        output.write('\ufeef'); // emits 0xef
        output.write('\ufebb'); // emits 0xbb
        output.write('\ufebf'); // emits 0xbf
        output.write(bytes);
        output.flush();
    }
    
    file.delete();
    bomFile.renameTo(file);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜