Why is it allowed to cast a pointer to a reference?
Originally being the topic of this question, it emerged that the OP just overlooked the dereference. Meanwhile, this answer got me and some others thinking - why is it allowed to cast a pointer to a reference with a C-style cast or reinterpret_cast
?
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
char& c2 = reinterpret_cast<char&>(pc);
}
The above code compiles without any warning or error (regarding the cast) on Visual Studio while GCC will only give you a warning, as shown here.
My first thought was that the pointer somehow automagically gets dereferenced (I work with MSVC normally, so I didn't get the warning GCC shows), and tried the following:
#include <iostream>
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
std::cout << *pc << "\n";
c1 = 'B';
std::cout << *pc << "\n";
}
With the very interesting output shown here. So it seems that you are accessing the pointed-to variable, but at the same time, you are 开发者_如何学运维not.
Ideas? Explanations? Standard quotes?
Well, that's the purpose of reinterpret_cast
! As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type. For this reason, using reinterpret_cast
you can always cast an lvalue of one type to a reference of another type.
This is described in 5.2.10/10 of the language specification. It also says there that reinterpret_cast<T&>(x)
is the same thing as *reinterpret_cast<T*>(&x)
.
The fact that you are casting a pointer in this case is totally and completely unimportant. No, the pointer does not get automatically dereferenced (taking into account the *reinterpret_cast<T*>(&x)
interpretation, one might even say that the opposite is true: the address of that pointer is automatically taken). The pointer in this case serves as just "some variable that occupies some region in memory". The type of that variable makes no difference whatsoever. It can be a double
, a pointer, an int
or any other lvalue. The variable is simply treated as memory region that you reinterpret as another type.
As for the C-style cast - it just gets interpreted as reinterpret_cast
in this context, so the above immediately applies to it.
In your second example you attached reference c
to the memory occupied by pointer variable pc
. When you did c = 'B'
, you forcefully wrote the value 'B'
into that memory, thus completely destroying the original pointer value (by overwriting one byte of that value). Now the destroyed pointer points to some unpredictable location. Later you tried to dereference that destroyed pointer. What happens in such case is a matter of pure luck. The program might crash, since the pointer is generally non-defererencable. Or you might get lucky and make your pointer to point to some unpredictable yet valid location. In that case you program will output something. No one knows what it will output and there's no meaning in it whatsoever.
One can rewrite your second program into an equivalent program without references
int main(){
char* pc = new char('A');
char* c = (char *) &pc;
std::cout << *pc << "\n";
*c = 'B';
std::cout << *pc << "\n";
}
From the practical point of view, on a little-endian platform your code would overwrite the least-significant byte of the pointer. Such a modification will not make the pointer to point too far away from its original location. So, the code is more likely to print something instead of crashing. On a big-endian platform your code would destroy the most-significant byte of the pointer, thus throwing it wildly to point to a totally different location, thus making your program more likely to crash.
It took me a while to grok it, but I think I finally got it.
The C++ standard specifies that a cast reinterpret_cast<U&>(t)
is equivalent to *reinterpret_cast<U*>(&t)
.
In our case, U
is char
, and t
is char*
.
Expanding those, we see that the following happens:
- we take the address of the argument to the cast, yielding a value of type
char**
. - we
reinterpret_cast
this value tochar*
- we dereference the result, yielding a
char
lvalue.
reinterpret_cast
allows you to cast from any pointer type to any other pointer type. And so, a cast from char**
to char*
is well-formed.
I'll try to explain this using my ingrained intuition about references and pointers rather than relying on the language of the standard.
- C didn't have reference types, it only had values and pointer types (addresses) - since, physically in memory, we only have values and addresses.
- In C++ we've added references to the syntax, but you can think of them as a kind of syntactic sugar - there is no special data structure or memory layout scheme for holding references.
Well, what "is" a reference from that perspective? Or rather, how would you "implement" a reference? With a pointer, of course. So whenever you see a reference in some code you can pretend it's really just a pointer that's been used in a special way: if int x;
and int& y{x};
then we really have a int* y_ptr = &x
; and if we say y = 123;
we merely mean *(y_ptr) = 123;
. This is not dissimilar from how, when we use C array subscripts (a[1] = 2;
) what actually happens is that a
is "decayed" to mean pointer to its first element, and then what gets executed is *(a + 1) = 2
.
(Side note: Compilers don't actually always hold pointers behind every reference; for example, the compiler might use a register for the referred-to variable, and then a pointer can't point to it. But the metaphor is still pretty safe.)
Having accepted the "reference is really just a pointer in disguise" metaphor, it should now not be surprising that we can ignore this disguise with a reinterpret_cast<>()
.
PS - std::ref
is also really just a pointer when you drill down into it.
Its allowed because C++ allows pretty much anything when you cast.
But as for the behavior:
- pc is a 4 byte pointer
- (char)pc tries to interpret the pointer as a byte, in particular the last of the four bytes
- (char&)pc is the same, but returns a reference to that byte
- When you first print pc, nothing has happened and you see the letter you stored
- c = 'B' modifies the last byte of the 4 byte pointer, so it now points to something else
- When you print again, you are now pointing to a different location which explains your result.
Since the last byte of the pointer is modified the new memory address is nearby, making it unlikely to be in a piece of memory your program isn't allowed to access. That's why you don't get a seg-fault. The actual value obtained is undefined, but is highly likely to be a zero, which explains the blank output when its interpreted as a char.
when you're casting, with a C-style cast or with a reinterpret_cast, you're basically telling the compiler to look the other way ("don't you mind, I know what I'm doing").
C++ allows you to tell the compiler to do that. That doesn't mean it's a good idea...
精彩评论