开发者

Simplest way to mix sequences of types with iostreams?

I have a function void write<typename T>(const T&) which is implemented in terms of writing the T object to an ostream, and a matching function T read<typename T>() that reads a T from an istream. I am basically using iostreams as a plain text serialisation format, which obviously works fine for most built-in types, although I'm not sure how to effectively handle std::strings just yet.

I'd like to be able to write out a sequence of objects too, eg void write<typename T>(const std::vector<T>&) or an iterator based equivalent (although in practice, it would always be used with a vector). However, while writing an overload that iterates over the elements and writes them out is easy enough to do, this doesn't add enough information to allow the matching read operation to know how each element is delimited, which is essentially the same problem that I have with a single std::string.

Is there a single approach that can work for all basic types and std::string? Or perhaps I can get away with 2 overloads, one for numerical types, and o开发者_如何学Gone for strings? (Either using different delimiters or the string using a delimiter escaping mechanism, perhaps.)

EDIT: I appreciate the often sensible tendency when confronted with questions like this is to say, "you don't want to do that" and to suggest a better approach, but I would really like suggestions that relate directly to what I asked, rather than what you believe I should have asked instead. :)


A general-purpose serialisation framework is hard, and the built-in features of the iostream library are really not up to it - even dealing with strings satisfactorily is quite difficult. I suggest you either sit down and design the framework from scratch, ignoring iostreams (which then become an implementation detail), or (more realistically) use an existing library, or at least an existing format, such as XML.


Basically, you will have to create a file format. When you're restricted to built-ins, strings, and sequences of those, you could use whitespace as delimiters, write strings wrapped in " (escaping any " - and then \, too - occurring within the streams themselves), and pick anything that isn't used for streaming built-in types as sequence delimiter. It might be helpful to store the size of a sequence, too.

For example,

5 1.4 "a string containing \" and \\" { 3 "blah" "blubb" "frgl" } { 2 42 21 }

might be the serialization of an int (5), a float (1.4), a string ("a string containing " and \"), a sequence of 3 strings ("blah", "blubb", and "frgl"), and a sequence of 2 ints (42 and 21).

Alternatively you could do as Neil suggests in his comment and treat strings as sequences of characters:

{ 27 'a' ' ' 's' 't' 'r' 'i' 'n' 'g' ' ' 'c' 'o' 'n' 't' 'a' 'i' 'n' 'i' 'n' 'g' ' ' '"' ' ' 'a' 'n' 'd' ' ' '\' }


If you want to avoid escaping strings, you can look at how ASN.1 does things. It's overkill for your stated requirements: strings, fundamental types and arrays of these things, but the principle is that the stream contains unambiguous length information. Therefore nothing needs to be escaped.

For a very simple equivalent, you could output a uint32_t as "ui4" followed by 4 bytes of data, a int8_t as "si1" followed by 1 byte of data, an IEEE float as "f4", IEEE double as "f8", and so on. Use some additional modifier for arrays: "a134ui4" followed by 536 bytes of data. Note that arbitrary lengths need to be terminated, whereas bounded lengths like the number of bytes in the following integer can be fixed size (one of the reasons ASN.1 is more than you need is that it uses arbitrary lengths for everything). A string could then either be a<len>ui1 or some abbreviation like s<len>:. The reader is very simple indeed.

This has obvious drawbacks: the size and representation of types must be independent of platform, and the output is neither human readable nor particularly compressed.

You can make it mostly human-readable, though with ASCII instead of binary representation of arithmetic types (careful with arrays: you may want to calculate the length of the whole array before outputting any of it, or you may use a separator and a terminator since there's no need for character escapes), and by optionally adding a big fat human-visible separator, that the deserializer ignores. For example, s16:hello, worlds12:||s12:hello, world is considerably easier to read than s16:hello, worlds12:s12:hello, world. Just beware when reading that what looks like a separator sequence might not actually be one, and you have to avoid falling into traps like assuming s5:hello|| in the middle of the code means there's a string 5 chars long: it might be part of s15:hello||s5:hello||.

Unless you have very tight constraints on code size, it's probably easier to use a general-purpose serializer off the shelf than it is to write a specialized one. Reading simple XML with SAX isn't difficult. That said, everyone and his dog has written "finally, the serializer/parser/whatever that will save us ever hand-coding a serializer/parser/whatever ever again", with greater or lesser success.


You may consider using boost::spirit, which simplifies parsing of basic types from arbitrary input streams.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜