开发者

Advantages of using union when same thing can be done using struct - C

I have difficulty in understanding the use of union in C. I have read lot of posts here on SO about the subject. But none of them explains about why union is preferred when same thing can be achieved using a struct.

Quoting from K&R

As an example such as might be found in a compiler symbol table manager, suppose that a constant may be an int, a float, or a character pointer. The value of a particular constant must be stored in a variable of the proper type, yet it is most convenient for table management if the value occupies the same amount of storage and is stored in the same place regardless of its type. This is the purpose of a union a single variable that can legitimately开发者_开发知识库 hold any of one of several types. The syntax is based on structures:

union u_tag {
      int ival;
      float fval;
      char *sval;
} u;

The usage will be

if (utype == INT)
    printf("%d\n", u.ival);
if (utype == FLOAT)
    printf("%f\n", u.fval);
if (utype == STRING)
    printf("%s\n", u.sval);
else
    printf("bad type %d in utype\n", utype);

The same thing can be implemented using a struct. Something like,

struct u_tag {
    utype_t utype;
    int ival;
    float fval;
    char *sval;
} u;

if (u.utype == INT)
    printf("%d\n", u.ival);
if (u.utype == FLOAT)
    printf("%f\n", u.fval);
if (u.utype == STRING)
    printf("%s\n", u.sval);
else
    printf("bad type %d in utype\n", utype);

Isn't this the same? What advantage union gives?

Any thoughts?


In the example you posted, the size of union would be the size of float (assuming it is the largest one - as pointed out in the comments, it can vary in a 64 bit compiler), while the size of struct would be the sum of sizes of float, int, char* and the utype_t (and padding, if any).

The results on my compiler:

union u_tag {
    int ival;
    float fval;
    char *sval;
};
struct s_tag {
    int ival;
    float fval;
    char *sval;
};

int main()
{
    printf("%d\n", sizeof(union u_tag));  //prints 4
    printf("%d\n", sizeof(struct s_tag)); //prints 12
    return 0;
}


Unions can be used when no more than one member need be accessed at a time. That way, you can save some memory instead of using a struct.

There's a neat "cheat" which may be possible with unions: writing one field and reading from another, to inspect bit patterns or interpret them differently.


Union uses less memory and lets you do more dangerous things. It represents one continuous block of memory, which can be interpreted as either an integer, floating point value or a character pointer.


Unions are used to save only one type of data at a time. If a value is reassigned the old value is overwritten and cannot be accessed. In your example int ,float and char members can all have different values at any time when used as a struct. Its not the case in union. So it depends on your program requirements and design. Check this article on when to use union. Google may give even more results.


The language offers the programmer numerous facilities to apply high level abstractions to the lowest level machine data and operations.

However, the mere presence of something does not automatically suggest its use is a best practice. Their presence makes the language powerful and flexible. But industry needs led to the development of programming techniques that favored clarity and maintainability over the absolute best code efficiency or storage efficiency possible.

So if a problem's solution set contains both unions and structures it is the programmer's responsibility to decide whether the need for compact storage outweighs the costs.

In recent times the cost of memory has been exceedingly low. The introduction of the bool type (and even prior to that, int variables) allowed a programmer of 32-bit systems to use 32 bits to represent a binary state. You see that frequently in programming even though a programmer could use masks and get 32 true/false values into a variable.

So to answer your question, the union offers more compact storage for a single value entity out of several possible types than a traditional structure but at the cost of clarity and possible subtle program defects.


Using unions to save memory is mostly not done in modern systems, since the code to access a union member will quickly take up more space (and be slower) than just adding another word sized variable to memory. However, when your code has to support multiple architectures with different endiannesses (whew, what a word), unions can be handy. I tend to prefer using an endian utility library (to functions), but some people like unions.

Memory mapped hardware registers are also commonly accessed with unions. Bit fields in C (don't use them, they're mean) can be passed around as words using unions.


unions have two dominant uses:

First is to provide a variant type, as you have outlined. In contrast to the struct approach, there is one unit of memory shared between all members in the union. If memory isn't an issue, a struct will also serve this function.

I typically embed the union in the struct - the struct ensures that type and data are stored together, and the union means there is exactly one value being stored.

struct any_tag {
    utype_t utype;
    union {
        int ival;
        float fval;
        char *sval;
    } u;
} data;

Second, a union has great use for low level access to raw data - reinterpreting one type as another. The purpose I've used this for is reading and writing binary encoded data.

float ConvertByteOrderedBufferTo32bitFloat( char* input ) {
union {
    float f;
    unsigned char buf[4];
} data;

#if WORDS_BIGENDIAN == 1
data.buf[0] = input[0];
data.buf[1] = input[1];
data.buf[2] = input[2];
data.buf[3] = input[3];
#else
data.buf[0] = input[3];
data.buf[1] = input[2];
data.buf[2] = input[1];
data.buf[3] = input[0];
#endif

return dat1.f;
}

Here, you can write to the individual bytes, depending on platform endianness, then interpret those 4 raw char bytes as a IEEE float. Casting that char array to float would not have the same result.


As often mentioned before: unions save memory. But this is not the only difference. Stucts are made to save ALL given sub-types while unions are made to save EXACTLY ONE of the given sub-types. So if you want to store either an integer or a float then a union is probably the thing you need ( but you need to remember somewhere else which kind of number you have saved ). If you want to store both, then you need a struct.


borrowing from the quote you posted "...any of one of several types..." of the union members at a time. That is exactly what union is; while struct members can all be assigned and accessed at a time.

union makes more sense in doing some system level(os) programs like process communications/concurrency handling.


Unions are tricky. For years, I couldn't figure them out, then I started doing things with network protocols, and someone showed me the light. Say you have a header, and then after the header, there are various different types of packets, something like:

| type (4 bytes) | uid (8 bytes) | payload length (2 bytes) | Payload (variablelength) |

And then there would be various types of packet payloads... For the sake of argument, there could be hello, goodbye, and message packets...

Well, you can build a nested set of structs/unions that can exactly represent a packet in that protocol like so...

struct packet {
  uint type;
  char unique_id [8];
  ushort payload_length;
  union payload {

    struct hello {
      ushort version;
      uint status;
    };

    struct goodbye {
      char reason[20];
      uint status;
    };

    struct message {
      char message[100];
    };

  };
};

Inevitably, you get this protocol from the Operating System through a read() call, and it's just a jumble of bytes. But if you are careful with your structure definition, and all the types are the right size, you can simply make a pointer to the struct, point it at your buffer filled with random data, and...

char buf[100];
packet *pkt;
read(outsideworld,&buf,1000);
pkt = (struct packet *)&buf;

and reading your packets is as simple as...

switch(pkt->type){

  case PACKET_MESSAGE:
    printf("message = %s\n",
           pkt->payload.message.message);
    break;

  case PACKET_HELLO:
    printf("hello! version = %d status = %d\n",
           pkt->payload.hello.version,
           pkt->payload.hello.status);
    break;
  case PACKET_GOODBYE:
    printf("goodbye! reason = %s status = %d\n",
           pkt->payload.goodbye.reason,
           pkt->payload.goodbye.status);
    break;
}

No grovelling around, counting bytes, etc... You can nest this as deeply as you want (make a union for ip addresses, that gives you the whole thing as an unsigned int, or the individual bytes so it's easier to print 192.168.0.1 out of it).

The unions don't slow down your code, because it all just gets translated into offsets in machine code.


An example would make sense here. See the example below:

union xReg
{
    uint allX;
    struct
    {
        uint x3      : 9;
        uint x2      : 9;
        uint x1      : 14;
    };
};

uint is a typedef of unsigned int.

Here, this union represents a 32 bit register. You can read the register using allX, and then manipulate it using the struct.

This eases from unnecessary bit shifts if we use allX for the bit manipulation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜