serialize any data type as vector<uint8_t> - use reinterpret_cast?

2023-01-05 05:38 问答作者：

I didnt find anything directly related in searching, so please forgive if this is a duplicate.

What I am looking to do is serialize data across a network connection. My approach is to convert everything I need to transfer to a std::vector< uint8_t > and on the receiving side unpack the data into the appropriate variables. My approach looks like this:

template <typename T>
inline void pack (std::vector< uint8_t >& dst, T& data) {
    uint8_t * src = static_cast < uint8_t* >(static_cast < void * >(&data));
    dst.insert (dst.end (), src, src + sizeof (T));
}   

template <typename T>
inline void unpack (vector <uint8_t >& src, int index, T& data) {
    copy (&src[index], &src[index + sizeof (T)], &data);
}

Which I'm using like

vector< uint8_t > buffer;
uint32_t foo = 103, bar = 443;
pack (buff, foo);
pack (buff, bar);

// And on the receive side
uint32_t a = 0, b = 0;
size_t offset = 0;
unpack (buffer, offset, a);
offset += sizeof (a);
unpack (buffer, offset, b);

My concern is the

uint8_t * src = static_cast < uint8_t* >(static_cast < void * >(&data));

line (which I understand to do the same as reinterpret_cast). Is there a better way to accomplish this without the double cast?

My naive approach was to just use static_cast< uint8_t* >(&data) which failed. I've been told in the past that reinterpret_cast is bad. So I'd like to 开发者_Go百科avoid it (or the construct I have currently) if possible.

Of course, there is always uint8_t * src = (uint8_t *)(&data).

Suggestions?

My suggestion is to ignore all the people telling you that reinterpret_cast is bad. They tell you it is bad, because it's generally not a good practice to take the memory map of one type and pretend that it's another type. But in this case, that is exactly what you want to do, as your entire purpose is to transmit the memory map as a series of bytes.

It is far better than using a double-static_cast, as it fully details the fact that you are taking one type and purposefully pretending that it is something else. This situation is exactly what reinterpret_cast is for, and dodging using it with a void pointer intermediary is simply obscuring your meaning with no benefit.

Also, I'm sure that you're aware of this, but watch for pointers in T.

Your situation is exactly what reinterpret_cast is for, it's simpler than a double static_cast and documents clearly what you're doing.

Just to be safe, you should use unsigned char instead of uint8_t:

doing reinterpret_cast to unsigned char * and then dereferencing the resulting pointer is safe and portable and is explicitly permitted by [basic.lval] §3.10/10
doing reinterpret_cast to std::uint8_t * and then dereferencing the resulting pointer is a violation of strict aliasing rule and is undefined behavior if std::uint8_t is implemented as extended unsigned integer type.

If it exists, uint8_t must always have the same width as unsigned char. However, it need not be the same type; it may be a distinct extended integer type. It also need not have the same representation as unsigned char (see When is uint8_t ≠ unsigned char?).

(This isn't completely hypothetical: making [u]int8_t a special extended integer type allows some aggressive optimizations)

If you really want uint8_t, you could add a:

static_assert(std::is_same<std::uint8_t, unsigned char>::value,
              "We require std::uint8_t to be implemented as unsigned char");

so that the code won't compile on platforms on which it would result in undefined behavior.

You can get rid of one cast by exploiting the fact that any pointer can be implicitly cast to void*. Also, you might want to add a few const:

//Beware, brain-compiled code ahead!
template <typename T>
inline void encode (std::vector< uint8_t >& dst, const T& data)
{
    const void* pdata = &data;
    uint8_t* src = static_cast<uint8_t*>(pdata);
    dst.insert(dst.end(), src, src + sizeof(T));
}

You might want to add a compile-time check for T being a POD, no struct, and no pointer.

However, interpreting some object's memory at the byte-level is never going to be save, period. If you have to do it, then do it in a nice wrapper (as you have done), and get over it. When you port to a different platform/compiler, have an eye on these things.

You're not doing any actual encoding here, you're just copying the raw representation of the data from memory into a byte array and then sending that out over the network. That's not going to work. Here's a quick example as to why:

struct A {
  int a;
};

struct B {
  A* p_a;
}

What happens when you use your method to send a B out over the network? The recipient receives p_a, the address of some A object on your machine, but that object is not on their machine. And even if you sent them the A object too, it wouldn't be at the same address. There's no way that can work if you just send the raw B struct. And that's not even considering more subtle issues like endianness and floating point representation which can affect the transmission of such simple types as int and double.

What you are doing right now is fundamentally no different than just casting to uint8_t* as far as whether it's going to work or not is concerned (it won't work, except for the most trivial cases).

What you need to do is devise a method of serialization. Serialization means any way of solving this sort of problem: how to get objects in memory out onto the network in a form such that they can be meaningfully reconstructed on the other side. This is a tricky problem, but it is a well-known and repeatedly solved problem. Here's a good starting point for reading: http://www.parashift.com/c++-faq-lite/serialization.html

继续阅读：network-programming serialization templates

serialize any data type as vector<uint8_t> - use reinterpret_cast?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？