How would you parse this TLV in Java?
I have a mobile app that I've written for iPhone (Objective-C) that allows users to import data using a specific format. I have the same app written for Android in Java and I've had users start asking for the ability to import. The format of the data is a portable standard that folks who write apps like this have to be able to import and export.
While I did write what I'm about to ask in Objective-C, I have a feel that I could have made my life quite a bit easier by doing it a different way. So, I'd like to ask how you'd parse the following TLV in Java. I don't need code, just the gist.
Here's the TLV format:
<Type:Length>Value<Type:Length>Value<Type:Length>Value<end>
Each record starts with <
and ends with <end>
. \n within records is acceptable and zero length 开发者_运维问答values are okay.
Here's an example input describing four different cars, note the multi-line record and the zero length value.
<make:4>ford<model:7>contour<color:3>red<end>
<make:5>mazda<model:3>mpv<color:5>black<end>
<make:3>bmw
<model:3>335
<color:6>yellow
<end>
<make:7>unknown<model:0><color:4>grey<end>
Once the data is parsed, I'll be inserting it into an SQLite DB so ultimately looping the data by each record will result in a bunch of strings that I can use as part of the INSERT statement.
Thanks for any ideas you can provide!
Nick
Very strange format. Is there a published specification?
You can try doing the string tokenization route. You could leverage the built-in Java regex to help with the matching, or even just use basic String class methods (split and trim being your friend). Basically just do:
String[] lines = input.split("<end>");
for(String line : lines)
{
line = line.trim();
String[] sublines = line.split("<");
for(String subline : sublines)
{
subline = subline.trim();
...additional breaking, trimming, branching...
}
}
The type length is an interesting validation component, but is a little odd for a modern language. One BIG question I would ask would be what encoding[s] to expect. UTF-8? 7-bit ASCII? Something strange?
My friends would call the pseudo-code above a hack and tell me to do something like JavaCC, but I have nerdy and impractical friends. ;)
If the input file isn't going to be too large you can read it all into a String then split the string into an array based on <end>
as a delimiter. Then iterate over the array using regex to capture each Type
and corresponding Value
.
The xmlishness of the format is somewhat confusing. The Length is the length of token right? I guess I would use the following algo:
next_record:
while (! eof) {
read token between '<' and '>'
if (token == "end") {
continue next_record
}
split token into type and length
read length number of characters into value
add tuplee (type, length, value) to collection
}
精彩评论