开发者

Union memory share in C

Edit2: Can I do polymorphism with Union? It seems to me that I can change the data structure based on my need.

Edit: Fix the code. Use "." instead of "->". What I want to ask is, how to make sure the value is stored correctly when there's different data type (like int and char use interchangebly? Since both has different memory size, the one which needs bigger memory space would be allocate memory space for both types of variables to share.

Suppose I have 2 structs:

typedef struct a{
          int a;
}aType;

typedef struct b{
          char b;
}bType;

typedef union{
         aType a_type;
         bType b_type;
}ab;

int main(void){
         ab v1;
         v1.a_type.a = 5;
         v1.b_type.b = 'a'
}

As far as I know, both aType and bType will share the same memory. Since int has 3 bytes greater (int is 4 bytes, and char is 1 byte), it will have 4 memory blocks. The first one is the left most and the last one is the right most. The time I assign 'a' to variable b of v1, it will stay in the first block (the left most) of memory block. The value 5 still remains in the fourth block of memory (the right most).

T开发者_StackOverflow中文版herefore, when prints it out, it will produce garbage value, won't it? If so, how to fix this problem? By this problem, which means if I store 'a' into b_type, the share memory must be sure to have that value 'a' only, not the previous integer value 5.


There is no right behavior. Setting a union via one member and retrieving a value from a different member causes undefined behavior. You can do useful things with this technique, but it is very hardware and compiler dependent. You will need to consider processor endianness and memory alignment requirements.

Back when I did almost all my programming in C, there were two (portable) techniques using unions that I relied on pretty heavily.

A tagged union. This is great when you need a dynamically typed variable. You set up a struct with two fields: a type discriminant and a union of all possible types.

struct variant {
  enum { INT, CHAR, FLOAT } type;
  union value {
    int i;
    char c;
    float f;
  };
};

You just had to be very careful to set the type value correctly whenever you changed the union's value and to retrieve only the value specified by the type.

Generic pointers. Since you can be pretty sure that all pointers have the same size and representation, you can create a union of pointer types and know that you can set and retrieve values interchangeably without regard to type:

typedef union {
  void *v;
  int* i;
  char* c;
  float* f;
} ptr;

This is especially useful for (de)serializing binary data:

// serialize
ptr *p;
p.v = ...; // set output buffer
*p.c++ = 'a';
*p.i++ = 12345;
*p.f++ = 3.14159;

// deserialize
ptr *p;
p.v = ...; // set input buffer
char c = *p.c++;
int i = *p.i++;
float f = *p.f++;

FYI: You can make your example simpler. The structs are unnecessary. You'll get the same behavior with this:

int main() {

  union {
    int a;
    char b;
  } v1;

  v1.a = 5;
  v1.b = 'a';
}


The behavior you describe is platform/system/compiler dependent. On Intel x86 processors, for instance, the 5 is likely to be the first byte in the int for the gcc compiler.

The union interest comes from two main angles

  • share the same space of memory in order to minimize the required memory allocation (in this case, the first byte [for instance] may indicate the type of the data in the structure/union).
  • to analyze some data structure, without the need of using casting and pointers. For instance, a union between a double and a char[8] on some platforms is an easy way to get a per-char/byte view of the double structure.

If there is no benefit in using a union, don't do it.


Well, first of all we should know if you're using a Big Endian od Little Endian processor. Windows & Linux uses little endian format that means that the value 0x00000005 is actually written as 05-00-00-00, as if you write it right to left.
So, firs you put 5 into a part that means that the first byte is 05 and all the others are 00. Than you place 'a' into b part you overwrite 05 with corresponding ascii value, that means 0x61. When you look at the resulting number should be ... 97, that is the value of 0x61.

Alignment of union should start at the beginning, but byte order is platform dependent. Qhat you told shoul de correct under Big Endian architecture, as Sun Solaris or any Risc processor.

Am I wrong?

HTH


The only way to fix this problem is by keeping track of what data you have stored. This is often done using a so-called tag member, like so:

struct mystructA {
    int data;
};
struct mystructB {
    char data;
};
enum data_tag {
    TAG_STRUCT_A,
    TAG_STRUCT_B
};
struct combined {
    enum data_tag tag;
    union {
        struct mystructA value_a;
        struct mystructA value_b;
    } data;
};

By keeping careful track of what data you put in, you can make sure only to read that same field later, thus ensuring you get a meaningful result.


If you access the union by the same element you last assigned to it with, there will be no problem. By accessing the char sized element of the union, the compiler makes sure to only return the bits you are interested in.

Edit: People were mentioning tagged unions. Here is another style of that, which SDL uses for their event struct.

enum union_tag {
    STRUCT_A,
    STRUCT_B
};

typedef struct {
    enum union_tag tag;
    int a;
} aType;

typedef struct {
    enum union_tag tag;
    char b;
} bType;

typedef union{
    enum union_tag tag;
    aType a_type;
    bType b_type;
} ab;

To access an element you would do something like this:

int result;

switch(my_union.tag){
    case STRUCT_A:
         result = my_union.a_type.a;
         break;
    case STRUCT_B:
         result = my_union.b_type.b;
         break;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜