开发者

Actual total size of struct's members

I must write array of struct Data to hard disk:


struct Data {
  char cmember;
  /* padding bytes */
  int  imember;  
};

AFAIK, most of compilers will add some padding bytes between cmember and imember members of Data, but I want save to file only actual data (without paddings).

I have next code for saving Datas array (in buffer instead of file for simplification):


bool saveData(Data* data, int dataLen, char* targetBuff, int buffLen)
{
  int actualLen = sizeof(char) + sizeof(int); // this code force us to know internal
                                              // representation of Data structure
  int actualTotalLen = dataLen * actualLen; 
  if(actualTotalLen > buffLen) {
    return false;
  }

  for(int i = 0; i开发者_如何学Go < dataLen; i++) {
    memcpy(targetBuff, &data[i].cmember, sizeof(char));
    targetBuff += sizeof(char);
    memcpy(targetBuff, &data[i].imember, sizeof(int));
    targetBuff += sizeof(int);
  }
  return true;
}

As you can see, I calculate actual size of Data struct with the code: int actualLen = sizeof(char) + sizeof(int). Is there any alternative to this ? (something like int actualLen = actualSizeof(Data))

P.S. this is synthetic example, but I think you understand idea of my question...


Just save each member of the struct one at a time. If you overload << to write a variable to a file, you can have

myfile << mystruct.member1 << mystruct.member2;

Then you could even overload << to take an entire struct, and do that inside the struct's operator<<, so in the end you have:

myfile << mystruct;

Resulting in save code that looks like:

myfile << count;
for (int i = 0; i < count; ++i)
    myFile << data[i];

IMO all that fiddling about with memory addresses and memcpy is too much of a headache when you could do it this way. This general technique is called serialization - hit google for more, it's a well-developed area.


You will have to pack your structure.

The way to do that changes depending on the compiler you are using.

For visual c++:

#pragma pack(push)
#pragma pack(1)

struct PackedStruct {
    /* members */
};

#pragma pack(pop)

This will tell the compiler to not pad members in the structure and restore the pack parameter to its initial value. Be aware that this will affect performance. If this struicture is used in critical code, you might want to copy the unpacked structure into a packed structure.

Also, resist temptations to use the command line parameter that totally disable padding, this will greatly affect performance.


IIUC, you are trying to copy the values of the structure members rather than the structure as a whole and store it to disk. Your approach looks good to me. I do not agree with those suggesting #pragma pack -- since they will help you get a packed structure at runtime.

Few notes:

  • sizeof(char) == 1, always, by definition

  • use the offsetof() macro

  • do not try to instantiate a Data object directly from this targetBuff (i.e. via casting) -- this is when you get into alignment issues and trip. Instead, copy the members out as you did while writing the buffer and you should not have issues


There is not an easy solution to this problem. You can usually create separate structures and tell the compiler to pack them tightly, something like:

/* GNU has attributes */
struct PackedData {
    char cmember;
    int  imember;
} __attribute__((packed));

or:

/* MSVC has headers and #pragmas */
#include <pshpack1.h>
struct PackedData {
    char cmember;
    int  imember;
};
#include <poppack.h>

Then you have to write code that transforms your unpacked structures into packed structures and vice-versa. If you are using C++, you can create template helper functions that are predicated on the structure type and then specialize them:

template <typename T>
std::ostream& encode_to_stream(std::ostream& os, T const& object) {
    return os.write((char const*)&object, sizeof(object));
}

template <typename T>
std::istream& decode_from_stream(std::istream& is, T& object) {
    return is.read((char*)&object, sizeof(object));
}

template<>
std::ostream& encode_to_stream<Data>(std::ostream& os, Data const& object) {
    encode_to_stream<char>(os, object.cmember);
    encode_to_stream<int>(os, object.imember);
    return os;
}
template <>
std::istream& decode_from_stream<Data>(std::istream& is, Data& object) {
    decode_from_stream<char>(is, object.cmember);
    decode_from_stream<int>(is, object.imember);
    return is;
}

The bonus is that the defaults will read and write POD objects including the padding. You can specialize as necessary to optimize your storage. However, you probably want to consider endianess, versioning, and other binary storage issues as well. It might be prudent to simply write an archival class that wraps your storage and provides methods for serialization and deserialization of primitives and then an open ended method that you can specialize as needed:

class Archive {
protected:
    typedef unsigned char byte;
    void writeBytes(byte const* byte_ptr, std::size_t byte_size) {
        m_fstream.write((char const*)byte_ptr, byte_size);
    }

public:
    template <typename T>
    void writePOD(T const& pod) {
        writeBytes((byte const*)&pod, sizeof(pod));
    }

    // Users are required to specialize this to use it.  If it is used
    // for a type that it is not specialized for, a link error will occur.
    template <typename T> void serializeObject(T const& obj);
 };

 template<>
 void Archive::serializeObject<Data>(Data const& obj) {
     writePOD(cmember);
     writePOD(imember);
 }

This is the approach that I have always ended up at after a bunch of perturbations in between. It is nicely extensible without requiring inheritance and gives you the flexibility to change your underlying data storage format as needed. You can even specialize writePOD to do different things for different underlying data types like ensuring that multibyte integers are written in network order or whatnot.


Don't know if this will help you, but I'm in the habit of ordering the members of the structs that I intend to write to files (or send over networks) so they have as little padding as possible. This is done my putting the members with the widest datatypes and most strict alignment first:

• pointers first
double
long long
long
float
int
short
char
• bitfields last

Any padding added by the compiler will come at the end of the struct data.

In other words, you could simplify your problem by eliminating the padding (if possible) by reordering the struct members:

struct Data
{
    int     imember;
    char    cmember;
    /* padding bytes here */
};

Obviously this won't solve your problem if you can't reorder the struct members (because it's used by a third-party API or because you need the initial members to have specific datatypes).


I would say that you are actually looking for serialization.

There are a number of framework for serialization, but I personally prefer Google Protocol Buffers over Boost.Serialization and other approaches.

Protocol Buffers has versioning and binary/human readable output.

If you are concerned about size, you always have the possibility of compressing the data. There are lightning fast compression algorithm like LZW which offer a good ratio speed/compression for example.


Look into the #pragma pack macro for your compiler. Some compilers use #pragma options align=packed or something similar.


As you can see, I calculate actual size of Data struct with the code: int actualLen = sizeof(char) + sizeof(int). Is there any alternative to this ?

No, not in standard C++.

Your compiler might provide a compiler-specific option, though. Packed structs as shown by Graeme and Coincoin might do.


If you don't want to use pragma pack, try to manually re-order the variables, like

struct Data {
  int  imember;
  char cmember;

};


You said @Coincoin that can not pack. If you just need size for some reason, here is dirty solution

#define STRUCT_ELEMENTS  char cmember;/* padding bytes */ int  imember; 
typedef struct 
{
    STRUCT_ELEMENTS 
}paddedData;

#pragma pack(push)
#pragma pack(1)

typedef struct 
{
    STRUCT_ELEMENTS 
}packedData;
#pragma pop

now you have size of both;

sizeof(packedData);
sizeof(paddedData);

Only reason that I can think of why you can not pack is linking this to other program. In that case you will need to pack your structure and then unpeck when working whit external program.


No, there is no way within the language proper to get this information. One way to approach a solution is to define your data classes indirectly, using some feature of the language - it could be as old-fashioned as macros and the preprocessor, or as new-fangled as tuple templates. You need something which lets you iterate over the class members systematically.

Here's a macro based approach:

#undef  Data_MEMBERS  
#define Data_MEMBERS(Data_OP) \  
    Data_OP(c, char) \  
    Data_OP(i, int)  
#undef  Data_CLASS_DEFINITION  
#define Data_CLASS_DEFINITION(name, type) \  
    type name##member;  
struct Data {  
    Data_MEMBERS(Data_CLASS_DEFINITION)  
};  
#define Data_SERIAL_SIZER(name, type) \  
    sizeof(type) +  
#define Data_Serial_Size \  
    (Data_MEMBERS(Data_SERIAL_SIZER) 0)

And so forth.


If you can rewrite the struct definition, you could try to use field specifiers to get rid of the holes, like so:

struct Data {  
   char cmember : 1;
   int  imember : 4;
};

Sadly, this does not guarantee that it still won't place imember 4 bytes after the start of cmember. But many compilers will get the idea and do it anyway.

Other alternatives:

  1. Reorder your members by size (largest first). This is an old embedded world trick to minimize holes.

  2. Use Ada instead.

The code

type Data is record
    cmember : character;
    imember : integer;
end record;

for Data use record
    cmember at 0 range 0..7;
    imemeber at 1 range 0..31;
end record;

Does exactly what you want.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜