开发者

delimiter for parsing message with unknown length (best practice?) java

I have a byte array (UTF-8 encoded string send as byte array from client). The message should have the following format:

'number' 'timestamp' 'str1' 'str2'

E.g

1 2000-01-31T20:00.00 the 1st str the 2nd str

It is cl开发者_如何学运维ear that the 'number' and 'timestamp' are easily read from the byte array. The start position of 'str1' can be also figured out. Considering that 'str1' and 'str2' can have any content (any length) in it, what type of delimiter can be used to know when 'str1' ends and 'str2' starts? Or are there any other tricks for parsing something like this.

note1: the message format is provided by me so any solution with a different format/order will do as long as all 4 pieces of info is in the byte array.

note2: I know I could encode str1 so that it doesn't contain my custom delimiter but I would like to avoid the overhead of encoding/decoding the data.

note3: One solution I could think of was to write the length of str1 in front of it when sending the data from client side. E.g 'number' 'timestamp' 'str1length' 'str1' 'str2'

are there any other tricks you can think of?

thanks


I recommend you do the 3rd option you listed:
number   timestamp   length_of_string1   string1   length_of_string_two   string2

Its probably a bad idea to stick a delimiter between string1 and string2 like "|" or "^]" because then you can no longer have the delimiter in your strings...

Also note that if you're sending a string, if it has spaces its going to be split up. The way to solve this is by doing a quotation-aware string split and escaping the string, surrounding it with "s


If I had freedom to choose the syntax, I would do one of the following:

  • If there is some Unicode character that is never going to appear in str1 and str2 (call it '|' for the sake of argument), I would concatenate the 4 components with '|' as the separator. Then I would "parse" the string using String.split("\\\\|");

  • If I couldn't be certain that any character I picked was not going to be used in str1 or str2, I'd pick a separator character and an escape character (say '|' and '\\') and use the escape character to escape a literal separator and a literal escape character. Building the message and then parsing it is more effort to code, but it will definitely work.

  • As an third alternative, if both ends were Java I'd consider using Java data streams to encode and decode the data.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜