开发者

Problem Inflating byte[] in Java?

I ran into an issue which I can't figure out. Here is the definition of the problem: I have some data in a Blob column in Db2/Linux environment. Blob was written into DB2 after the byte[] was compressed using JDK compression (code that does this is running in Linux environment). I am trying to write a simple program to read some of this data decompress it (using JDK) and create a String from the decompressed byte array in Windows Environment (my development environment). Issue is that after I decompress the Blob (byte[]), length of the decompressed byte array is usually 1-3 bytes longer than expected. What I mean by expected is that the offset and length fields are als开发者_StackOverflow中文版o being stored in the database. So in this case, length of the decompressed byte array is usually longer than the stored length in database, just a few bytes. So if I create a String object from the decompressed byte array and create another String object using the substring(offset, length) method using the offset and length fields from the database, my second String(the one I got by using substring method) is shorter.

An example would be: database record contains a blob, offset: 0, length: 260,409 after decompressing the blob -

 compressedByte[].length  - 71,212
 decompressedByte[].length   - 260,412
 new String(decompressByte[]).length()  - 260,412
 new String(decompressByte[]).subString(0, 260,409).length() - 260409

For some other input records, the difference I am seeing is anywhere between 1-3 bytes in length.

I am sort of puzzled with this issue and wondering if anyone could suggest any tips so I can do more debugging to figure this issue out. I am wondering whether this could be somehow related to how bytes are being stored/written in Linux environment and how they are being read in Windows? Thanks for your help.


I suspect the default encoding is different between the two systems.

// on the linux box   
byte [] blob = str.getBytes("UTF-8");

// in your code 
String str = new String(blob, "UTF-8");

Or at the least find out what the default encoding is on the linux box is (normal UTF-8) and skip step 1.

A really good examplation of what could be happening here is on Joel on software


A String is not a general holder for bytes. You will undoubtedly have different default character encodings between your db2/Linux environment and your Windows environment which will be causing the conversion back and forth between bytes and characters to be different.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜