strange object serialization problem in file parsing
I have a strange problem with object serialization. in the file documentation it states as following
The lead in starts with a 4-byte tag that identifies a TDMS segment ("TDSm"). The next four bytes are used as a bit mask in order to indicate what kind of data the segment contains. This bit mask is referred to as ToC (Table of Contents). Any combination of the following flags can be encoded in the ToC: The next four bytes contain a version number (32-bit unsigned integer), which specifies the oldest TDMS revision a segment complies with. At the time of this writing, the version number is 4713. The only previous version of TDMS has number 4712. The next eight bytes (64-bit unsigned integer) describe the length of the remaining segment (overall length of the segment minus length of the lead in). If further segments are appended to the file, this number can be used to locate the starting point of the following segment. If an application encountered a severe problem while writing to a TDMS file (crash, power outage), all bytes of this integer can be 0xFF. This can only happen to the last segment in a file. The last eight bytes (64-bit unsigned integer) describe the overall length of the meta information in the segment. This information is used for random access to the raw data. If the segment contains no meta data at all (properties, index information, object list), this value will be 0.
so i implemented as
class TDMsLEADIN {
public:
char Signature[4]; //TDSm
__int32 Toc;
unsigned __int32 vernum;
unsigned __int64 nextSegmentOff;
unsigned __int64 rawDataOff;
};
fread(&leadin,sizeof(TDMsLEADIN),1,f);
then i got signature="TDsm", TOc=6, vernum=4712 as expected. nextSegmentOff=833223655424, rawDataOff=8589934592 but expected both of nextSegmentOff and rawDataOff=194
then i break the class into two parts, and read two two parts seperately
class TDMsLEADIN {
public:
char Signature[4]; //TDSm
__int32 Toc;
unsigned __int32 vernum;
};
class TDMsLeadINend{
public:
unsigned __int64 nextSegmentOff;
unsigned __int64 rawDataOff;
};
fread(&leadin,sizeof(TDMsLEADIN),1,f);
fread(&leadin2,sizeof(TDMsLeadINend),1,f);
then i got nextSegmentOff ,rawDa开发者_如何学JAVAtaOff as expected=194. my question is what is wrong with the original code? why it works when i break it into two parts? i tried unsigned long long instead of unsigned __int64, but still the same result. it is quite strange.
Thanks
You seem to be just reading and writing the binary data in the struct directly.
Generally the compiler will align structure data for performance, so when it's a single struct there's a hidden 32-bit pad between vernum
and nextSegmentOff
to align nextSegmentOff
. When it's split into two structures there's no such extra padding and you're reading four bytes of padding and four bytes of real data into nextSegmentOff
.
You can test this by comparing the sizeof(TDMsLEADIN [second version]) + sizeof(TDMsLeadINend)
to sizeof(TDMsLEADIN [first version])
The standard way to serialize data is to serialize each underlying piece individually rather than relying on the layout of a class or structure as that can change by compiler without notice.
Your problem is that your compiler hasn't packed the struct so all the members are next to each other. For example, your compiler may have well decided that it likes your 64-bit variables to be 64-bit aligned in memory, and inserted a 4-byte padding in your struct to do this.
If you really need the I/O performance that this provides, you can usually tell the compiler to pack the struct, but 1) performance may suffer when you use nonaligned elements in the struct, and 2) your code will be nonportable, since different compilers specify this in different ways. See Visual C++ equivalent of GCC's __attribute__ ((__packed__)) for a quick summary of ways to do this on different compilers.
The portable but somewhat more prosaic way is:
fread(&lead.Signature, 4, 1, f);
fread(&lead.Toc, sizeof(__int32), 1, f);
...
精彩评论