Real thing about "->" and "."
I always wanted to know what is the real thing difference of how the compiler see a pointer to a struct (in C suppose) and a struct itself.
struct person p;
struct person *pp;
pp->age
, I always imagine that the compiler does: "value of pp + offset of atribute "age" in the struct".
But what it does with person.p
? It would be almost the same. For me "the programmer", p is no开发者_StackOverflow中文版t a memory address, its like "the structure itself", but of course this is not how the compiler deal with it.
My guess is it's more of a syntactic thing, and the compiler always does (&p)->age
.
I'm correct?
p->q
is essentially syntactic sugar for (*p).q
in that it dereferences the pointer p
and then goes to the proper field q
within it. It saves typing for a very common case (pointers to structs).
In essence, ->
does two deferences (pointer dereference, field dereference) while .
only does one (field dereference).
Due to the multiple-dereference factor, ->
can't be completely replaced with a static address by the compiler and will always include at least address computation (pointers can change dynamically at runtime, thus the locations will also change), whereas in some cases, .
operations can be replaced by the compiler with an access to a fixed address (since the base struct's address can be fixed as well).
Updated (see comments):
You have the right idea, but there is an important difference for global and static variables only: when the compiler sees p.age
for a global or static variable, it can replace it, at compile time, with the exact address of the age
field within the struct.
In contrast, pp->age
must be compiled as "value of pp + offset of age
field", since the value of pp can change at runtime.
The two statements are not equivalent, even from the "compiler perspective". The statement p.age
translates to the address of p
+ the offset of age
, while pp->age
translates to the address contained in pp
+ the offset of age
.
The address of a variable and the address contained in a (pointer) variable are very different things.
Say the offset of age is 5. If p
is a structure, its address might be 100, so p.age
references address 105.
But if pp
is a pointer to a structure, its address might be 100, but the value stored at address 100 is not the beginning of a person
structure, it's a pointer. So the value at address 100 (the address contained in pp
) might be, for example, 250. In that case, pp->age
references address 255, not 105.
Since p is a local (automatic) variable, it is stored in the stack. Therefore the compiler accesses it in terms of offset with regard to the stack pointer (SP) or frame pointer (FP or BP, in architectures where it exists). In contrast, *p refers to a memory address [usually] allocated in the heap, so the stack registers are not used.
This is a question I've always asked myself.
v.x
, the member operator, is valid only for structs.
v->x
, the member of pointer operator, is valid only for struct pointers.
So why have two different operators, since only one is needed? For example, only the .
operator could be used; the compiler always knows the type of v
, so it knows what to do: v.x
if v
is a struct, (*v).x
if v
is a struct pointer.
I have three theories:
- temporary shortsightedness by K&R (which theory I'd like to be false)
- making the job easier for the compiler (a practical theory, given the conception time of C :)
- readability (which theory I prefer)
Unfortunately, I don't know which one (if any) is true.
In both cases the structure and its members are addressed by
address(person) + offset(age)
Using p with a struct stored in the stack memory gives the compiler more options to optimize memory usage. It could store the age only, instead of the whole struct if nothing else is used - this makes addressing with the above function fail (I think reading the address of a struct stops this optimization).
A struct on the stack may have no memory address at all. If the struct is small enough and only lives a short time it can be mapped to some of the processors registers (same as for the optimization above for reading the address).
The short answer: when the compiler does not optimize you are right. As soon as the compiler starts optimizing only what the c standard specifies is guaranteed.
Edit: Removed flawed stack/heap location for "pp->" since the pointed to struct can be on both heap and stack.
精彩评论