Serializing C-style structs (using C++)
Is it evil to serialize struct objects using memcpy?
In one of my projects I am doing the following: I memcpy a struct object, base64 encode it, and write it to file. I do the inverse when parsing the data. It seems to work OK, but in certain situations (for example when using the WINDOWPLACEMENT
for the HWND of Windows Media Player) it turns out that the decoded data does not match sizeof(WINDOWPLACEMENT)
.
Here are some code fragments:
// Using WINDOWPLACEMENT from Windows API headers:
typedef struct tagWINDOWPLACEMENT {
UINT length;
UINT flags;
UINT showCmd;
POINT ptMinPosition;
POINT ptMaxPosition;
RECT rcNormalPosition;
#ifdef _MAC
RECT rcDevice;
#endif
} WINDOWPLACEMENT;
static std::string EncodeWindowPlacement(const WINDOWPLACEMENT & inWindowPlacement)
{
std::stringstream ss;
{
Poco::Base64Encoder encoder(ss); // From the Poco C++ libraries
const char * offset = reinterpret_cast<const char*>(&inWindowPlacement);
std::vector<char> buffer(offset, offset + sizeof(inWindowPlacement));
for (size_t idx = 0; idx != buffer.size(); ++idx)
{
encoder << buffer[idx];
}
encoder.close();
}
return ss.str();
}
static WINDOWPLACEMENT DecodeWindowPlacement(const std::string & inEncoded)
{
std::string decodedString;
{
std::istringstream istr(inEncoded);
Poco::Base64Decoder decoder(istr); // From the Poco C++ libraries
decoder >> decodedString;
assert(decoder.eof());
if (decoder.fail())
{
throw std::runtime_error("Failed to parse Window placement data from the configuration file.");
}
}
if (decodedString.size() != sizeof(WINDOWPLACEMENT))
{
// !! Occurs frequently !!
throw std::runtime_error("Errors occured during parsing of the Window placement.");
}
WINDOWPLACEMENT windowPl开发者_StackOverflow社区acement;
memcpy(&windowPlacement, &decodedString[0], decodedString.size());
return windowPlacement;
}
I'm aware that copying classes in C++ using memcpy is likely to cause trouble because the copy constructors are not properly executed. I'm not sure if this also applies to C-style structs. Or is serialization by memory dumping simply not done?
Update: A bug in Poco's Base64Encoder/Decoder is not impossible, but unlikely. Its test cases seem pretty thorough: Base64Test.cpp.
You will run into problems if you need to transfer these files between machines that do not all share the same endianness and word size, or if you add/remove slots from the structs in future versions and need to retain binary compatibility.
I'm not sure how operator>>()
is implemented in Poco::Base64Decoder
. If it is same as istream
's operator>>()
, then after decoder >> decodedString;
decodedString
may not contain all characters from the input. For example, if there is any whitespace character in encoded string then decoder >> decodedString;
will read upto that whitespace.
Doing a memcpy
of classes/structs is okay if they're just Plain Old Data (POD), but if that's the case, then you could rely on C++ doing the copying for you via copy constructors (which exist for both struct
and class
types in C++).
Certainly you can do it the way you have been doing it - one of the products I've worked on serializes data using memcpy
, sends the data over the wire, and client applications decode the bytestream to get the data back.
But if you have a choice, you might want something higher level like boost.serialization
, which offers more flexibility and deep-pointer copying. The aforementioned Google ProtoBuffers would work nicely too.
Here are some threads discussing serialization methods in C++:
- boost serialization vs google protocol buffers?
- C++ Serialization Performance
I wouldn't go as far as to say that it's evil, but I think it is asking for trouble and weird problems in many cases.
I know it has been done and it can work (I've seen people serialize structs like that to send over a network connection), but it has a number of drawbacks that have been pointed out already (inflexibility, endianness problems, structs containing pointers, packing, etc).
I'd recommend a more robust way of serializing and deserializing your data. I've heard lots of good things about Google protocol buffers, something like that will be a lot more flexible and will probably save you headaches in the end.
Serializing data in the manner you've done it is not particularly evil, if you know you're staying on a machine with the same byte size, word size, endian-ness, etc. Since you're serializing the window placement information, you probably don't care about portability between two different machines, and only want to save this information between sessions on the same machine. I'd hazard a guess that you're storing this into the Registry. If you want portability for other data that is actually useful when it's ported to other architectures, then you can look at many of the other suggestions posted here already, such as Google protocol buffers, etc. Whitespace is a red-herring, as all WS is irrelevant in a base64 encoded data stream and all decoders should ignore it (PoCo does).I am curious to know what are the sizes of the string and the structure when it fails.Knowing this might give you some insight into the problem.
精彩评论