serialize any data type as vector<uint8_t> - use reinterpret_cast?
I didnt find anything directly related in searching, so please forgive if this is a duplicate.
What I am looking to do is serialize data across a network connection. My approach is to convert everything I need to transfer to a std::vector< uint8_t >
and on the receiving side unpack the data into the appropriate variables. My approach looks like this:
template <typename T>
inline void pack (std::vector< uint8_t >& dst, T& data) {
uint8_t * src = static_cast < uint8_t* >(static_cast < void * >(&data));
dst.insert (dst.end (), src, src + sizeof (T));
}
template <typename T>
inline void unpack (vector <uint8_t >& src, int index, T& data) {
copy (&src[index], &src[index + sizeof (T)], &data);
}
Which I'm using like
vector< uint8_t > buffer;
uint32_t foo = 103, bar = 443;
pack (buff, foo);
pack (buff, bar);
// And on the receive side
uint32_t a = 0, b = 0;
size_t offset = 0;
unpack (buffer, offset, a);
offset += sizeof (a);
unpack (buffer, offset, b);
My concern is the
uint8_t * src = static_cast < uint8_t* >(static_cast < void * >(&data));
line (which I understand to do the same as reinterpret_cast
). Is there a better way to accomplish this without the double cast?
My naive approach was to just use static_cast< uint8_t* >(&data)
which failed. I've been told in the past that reinterpret_cast
is bad. So I'd like to 开发者_Go百科avoid it (or the construct I have currently) if possible.
Of course, there is always uint8_t * src = (uint8_t *)(&data)
.
Suggestions?
My suggestion is to ignore all the people telling you that reinterpret_cast
is bad. They tell you it is bad, because it's generally not a good practice to take the memory map of one type and pretend that it's another type. But in this case, that is exactly what you want to do, as your entire purpose is to transmit the memory map as a series of bytes.
It is far better than using a double-static_cast
, as it fully details the fact that you are taking one type and purposefully pretending that it is something else. This situation is exactly what reinterpret_cast
is for, and dodging using it with a void pointer intermediary is simply obscuring your meaning with no benefit.
Also, I'm sure that you're aware of this, but watch for pointers in T.
Your situation is exactly what reinterpret_cast
is for, it's simpler than a double static_cast
and documents clearly what you're doing.
Just to be safe, you should use unsigned char
instead of uint8_t
:
- doing
reinterpret_cast
tounsigned char *
and then dereferencing the resulting pointer is safe and portable and is explicitly permitted by [basic.lval] §3.10/10 doing
reinterpret_cast
tostd::uint8_t *
and then dereferencing the resulting pointer is a violation of strict aliasing rule and is undefined behavior ifstd::uint8_t
is implemented as extended unsigned integer type.If it exists,
uint8_t
must always have the same width asunsigned char
. However, it need not be the same type; it may be a distinct extended integer type. It also need not have the same representation asunsigned char
(see When is uint8_t ≠ unsigned char?).(This isn't completely hypothetical: making
[u]int8_t
a special extended integer type allows some aggressive optimizations)
If you really want uint8_t
, you could add a:
static_assert(std::is_same<std::uint8_t, unsigned char>::value,
"We require std::uint8_t to be implemented as unsigned char");
so that the code won't compile on platforms on which it would result in undefined behavior.
You can get rid of one cast by exploiting the fact that any pointer can be implicitly cast to void*
. Also, you might want to add a few const
:
//Beware, brain-compiled code ahead!
template <typename T>
inline void encode (std::vector< uint8_t >& dst, const T& data)
{
const void* pdata = &data;
uint8_t* src = static_cast<uint8_t*>(pdata);
dst.insert(dst.end(), src, src + sizeof(T));
}
You might want to add a compile-time check for T
being a POD, no struct,
and no pointer.
However, interpreting some object's memory at the byte-level is never going to be save, period. If you have to do it, then do it in a nice wrapper (as you have done), and get over it. When you port to a different platform/compiler, have an eye on these things.
You're not doing any actual encoding here, you're just copying the raw representation of the data from memory into a byte array and then sending that out over the network. That's not going to work. Here's a quick example as to why:
struct A {
int a;
};
struct B {
A* p_a;
}
What happens when you use your method to send a B
out over the network? The recipient receives p_a
, the address of some A
object on your machine, but that object is not on their machine. And even if you sent them the A
object too, it wouldn't be at the same address. There's no way that can work if you just send the raw B
struct. And that's not even considering more subtle issues like endianness and floating point representation which can affect the transmission of such simple types as int
and double
.
What you are doing right now is fundamentally no different than just casting to uint8_t*
as far as whether it's going to work or not is concerned (it won't work, except for the most trivial cases).
What you need to do is devise a method of serialization. Serialization means any way of solving this sort of problem: how to get objects in memory out onto the network in a form such that they can be meaningfully reconstructed on the other side. This is a tricky problem, but it is a well-known and repeatedly solved problem. Here's a good starting point for reading: http://www.parashift.com/c++-faq-lite/serialization.html
精彩评论