开发者

Java unreadable strings

I have made a java socket listener which listens on port 80. And what is basically does is it gathers the data that it listens on port 80 and stores it in a temporary string which is then used for further operation(type conversions et all). Now the basic problem is that the data that comes on port 80 has parts开发者_运维知识库 that are unreadable (like @ [ Qô — z ‡ ). And now that im storing it in a string and when i print the string, it prints only the readable parts which is understandable, but what puzzles me is that when i print the length of the string, it only prints the length of the readable part. SO i want to know if my approach of storing unreadable string parts in a string is acceptable to enable further operations on them. If not, I would also like some pointers as to how I could store such incoming data.

Regards p1nG


Something does not make sense here. If you are storing the "unreadable" part of the data in the String, it will be reflected in the length of the String.

i want to know if my approach of storing unreadable string parts in a string is acceptable to enable further operations on them. If not, I would also like some pointers as to how I could store such incoming data.

It depends on why the data is unreadable.

  • One possibility is that the remote system is sending data in some unexpected character set or encoding. For example, if it is sending Latin-1 and you are expecting UTF-8 (or vice versa) some sections of the text may be unreadable. The solution is to figure out what character set and encoding the remote system is sending, and use the correct Java charset name when converting to to Java characters.

  • Another possibility is that some of the data is binary data. If so, you should separate the text from the binary data, based on the application protocol used by the remote system.

  • Finally, the unreadable stuff might be caused by line noise or such like. If that's the case, you should probably leave it intact.

An alternative approach is to use a byte array (or something similar) rather than a String to hold the data. The problem with trying to convert bytes to characters when you are not sure of the character set and encoding is that the conversion may be lossy. By storing the raw bytes, your application at least has the possibility of getting it right later ... when you figure out what the correct conversion is.


you can store the data in a java.nio.ByteBuffer to avoid all the string wackiness...

if it's truly text being sent in some wide character encoding, you'll want to convert the byte buffer into a string using the appropriate character set with the handy java.nio.charset.Charset.decode

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜