开发者

Dereferencing and typecasting

I've constructed the following sections of code to help myself understand pointer dereferencing and typecasting in C.

char a = 'a';
char * b = &a;
int i = (int) *b;

For the above, I understand that on the 3rd line, I've dereferenced b and got 'a' and (int) will typecast the value of 'a' to its corresponding value of 97 which is stored into i. But for this section of code:

char a = 'a';
char * b = &a;
int i = *(int 开发者_运维技巧*)b;

This results in i being some arbitrary large number like 792351. I'm assuming this is a memory address but my question is why? When I typecast b to an integer pointer, does this actually cause b to point to a different area in memory? What is going on?

EDIT: If the above doesn't work, then why would something like this work:

char a = 'a';
void * b = &a;
char c = *(char *)b;

This correctly assigns 'a' to c.


Your int is larger than your char - you get the 'a' value + some random data following it in memory.

E.g, assuming this layout in memory:

'a'
0xFF
0xFF
0xFF

Your char * and int * both point to the 'a'. When you dereference the char *, you get only the first byte, the 'a'. When you dereference the int * (assuming your int is 32-bit) you get the 'a' and the 3 bytes of uninitialized data following it.

EDIT: In response to updated question:

In char c = *(char *)b;, b still points at the 'a' value. You cast it to a char *, and then dereference it, getting the char pointed to by a char *


The last line you're concerned about does a very bad thing. First, it treats b as an int* whereas b is a char*. That is, the memory pointer to by b is assumed as 4 bytes(typically) instead of 1 byte. So when you dereference it, it goes to the 1 byte pointed by the actual b, takes the following 3 bytes too, treats those 4 bytes as a single int, and gives you the result. That's why it's garbage.

In general, casting one pointer type to another pointer type must be done with great caution.


You're casting a char pointer to an int pointer. Characters are (usually) stored as 8 bits. ints, on the other hand, are 32 bits (or 64 on 64-bit systems). So if you look at the other 24 bits of memory next to the 8 bits worth of b, you'll get a bunch of extra bits that weren't initialized. Even the position of *b in i is architecture dependent.

big-endian:    **** ****|**** ****|**** ****|0110 0001
little-endian: 0110 0001|**** ****|**** ****|**** ****

When you cast the character stored in the above, all the asterisks become relevant.


Since a char is 1 Byte long, and an int 4, when you read an int from the address of a single character, you're reading the character and 3 more bytes. The content of these bytes is just whatever happens to lie in memory (pointers, the value of b) and could even be unallocated (resulting in a segmentation fault).


When you type cast it to a (int *) type, it will refer to a total of 4 bytes(size if int) in memory.


In the second case, you're treating the same address as if it pointed to an int. Officially, the result is simply undefined behavior.

Realistically, what happens is that whatever happens to be in the four1 bytes starting at that address get interpreted as an int.

1 4 bytes assuming a 32-bit int -- if your implementation has, for example, a 64-bit int, it'll be 8 bytes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜