开发者

Why dereferencing a null pointer is undefined behaviour?

According to ISO C++, dereferencing a null pointer is undefined behaviour. My curiosity is, why? Why standard has decided to declare it undefined behaviour? What is the rationale behind this decision? Compiler depen开发者_Go百科dency? Doesn't seem, because according to C99 standard, as far as I know, it is well defined. Machine dependency? Any ideas?


Defining consistent behavior for dereferencing a NULL pointer would require the compiler to check for NULL pointers before each dereference on most CPU architectures. This is an unacceptable burden for a language that is designed for speed.

It also only fixes a small part of a larger problem - there are many ways to have an invalid pointer beyond a NULL pointer.


The primary reason is that by the time they wrote the original C standard there were a number of implementations that allowed it, but gave conflicting results.

On the PDP-11, it happened that address 0 always contained the value 0, so dereferencing a null pointer also gave the value 0. Quite a few people who used these machines felt that since they were the original machine C had been written on/used to program, that this should be considered canonical behavior for C on all machines (even though it originally happened quite accidentally).

On some other machines (Interdata comes to mind, though my memory could easily be wrong) address 0 was put to normal use, so it could contain other values. There was also some hardware on which address 0 was actually some memory-mapped hardware, so reading/writing it did special things -- not at all equivalent to reading/writing normal memory at all.

The camps wouldn't agree on what should happen, so they made it undefined behavior.

Edit: I suppose I should add that by the time the wrote the C++ standard, its being undefined behavior was already well established in C, and (apparently) nobody thought there was a good reason to create a conflict on this point so they kept the same.


The only way to give defined behaviour would be to add a runtime check to every pointer dereference, and every pointer arithmetic operation. In some situations, this overhead would be unacceptable, and would make C++ unsuitable for the high-performance applications it's often used for.

C++ allows you to create your own smart pointer types (or use ones supplied by libraries), which can include such a check in cases where safety is more important than performance.

Dereferencing a null pointer is also undefined in C, according to clause 6.5.3.2/4 of the C99 standard.


This answer from @Johannes Schaub - litb, puts forward an interesting rationale, which seems pretty convincing.


The formal problem with merely dereferencing a null pointer is that determining the identity of the resulting lvalue expression is not possible: Each such expression that results from dereferencing a pointer must unambiguously refer to an object or a function when that expression is evaluated. If you dereference a null pointer, you don't have an object or function that this lvalue identifies. This is the argument the Standard uses to forbid null-references.

Another problem that adds to the confusion is that the semantics of the typeid operator make part of this misery well defined. It says that if it was given an lvalue that resulted from dereferencing a null pointer, the result is throwing a bad_typeid exception. Although, this is a limited area where there exist an exception (no pun) to the above problem of finding an identity. Other cases exist where similar exception to undefined behavior is made (although much less subtle and with a reference on the affected sections).

The committee discussed to solve this problem globally, by defining a kind of lvalue that does not have an object or function identity: The so called empty lvalue. That concept, however, still had problems, and they decided not to adopt it.


Note:
Marking this as community wiki, since the answer & the credit should go to the original poster. I am just pasting the relevant parts of the original answer here.


The real question is, what behavior would you expect ?

A null pointer is, by definition, a singular value that represents the absence of an object. The result of dereferencing a pointer is to obtain a reference to the object pointed to.

So how do you get a good reference... from a pointer that points into the void ?

You do not. Thus the undefined behavior.


I suspect it's because if the behavior is well-defined the compiler has to insert code anywhere pointers are dereferenced. If it's implementation defined then one possible behavior could still be a hard crash. If it's unspecified then either the compilers for some systems have extra undue burden or they may generate code that causes hard crashes.

Thus to avoid any possible extra burden on compilers they left the behavior undefined.


Sometimes you need an invalid pointer (also see MmBadPointer on Windows), to represent "nothing".

If everything was valid, then that wouldn't be possible. So they made NULL invalid, and disallowed you from dereferencing it.


Here is a simple test & example:

  1. Allocate a pointer:

    int * pointer;

? What value is in the pointer when it is created?
? What is the pointer pointing to?
? What happens when I dereference this point in its current state?

  1. Marking the end of a linked list. In a linked list, a node points to another node, except for the last.
    What is the value of the pointer in the last node?
    What happens when you derefernce the "next" field of the last node?

The needs to be a value that indicates a pointer is not pointing to anything or that it's in an invalid state. This is where the NULL pointer concept comes into play. The linked list can use a NULL pointer to indicate the end of the list.


Arguments have been made elsewhere that having well-defined behaviour for null-pointer-references is impossible without a lot of overhead, which I think is true. This is because AFAIU "well-defined" here also means "portable". If you would not treat nullptr references specially, you would end up generating instructions that simply try to read address 0, but that produces different behaviour on different processors, so that would not be well-defined.

So, I guess this is why derereferencing nullptr (and probably also other invalid pointers) is marked as undefined.

I do wonder why this is undefined rather then unspecified or implementation-defined, which are distict from undefined behaviour, but require more consistency.

In particular, when a program triggers undefined behaviour, the compiler can do pretty much anything (e.g. throw away your entire program maybe?) and still be considered correct, which is somewhat problematic. In practice, you would expect that compilers would just compile a null-pointer-dereference to a read of address zero, but with modern optimizers becoming better, but also more sensitive to undefined behaviour, I think, they sometimes do things that end up more thoroughly breaking the program. E.g. consider the following:

matthijs@grubby:~$ cat test.c
unsigned foo () {
        unsigned *foo = 0;
        return *foo;
}

matthijs@grubby:~$ arm-none-eabi-gcc  -c test.c -Os && objdump -d test.o 

test.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <foo>:
   0:   e3a03000        mov     r3, #0
   4:   e5933000        ldr     r3, [r3]
   8:   e7f000f0        udf     #0

This program just dereferences and accesses a null pointer, which results in an "Undefined instruction" being generated (halting the program at runtime).

This might be ok when this is an accidental nullpointer dereference, but in this case I was actually writing a bootloader that needs to read address 0 (which contains the reset vector), so I was quite surprised this happened.

So, not so much an answer, but some extra perspective on the matter.


According to original C standard NULL can be any value - not necessarily zero.

The language definition states that for each pointer type, there is a special value - the `null pointer' - which is distinguishable from all other pointer values and which is 'guaranteed to compare unequal to a pointer to any object or function.' That is, a null pointer points definitively nowhere; it is not the address of any object or function

There is a null pointer for each pointer type, and the internal values of null pointers for different types may be different.

(From http://c-faq.com/null/null1.html)


Although dereferencing a NULL pointer in C/C++ indeed leads undefined behavior from the language standpoint, such operation is well defined in compilers for targets which have memory at corresponding address. In this case, the result of such operation consists in simply reading the memory at address 0.

Also, many compilers will allow you to dereference a NULL pointer as long as you don't bind the referenced value. This is done to provide compatibility to non-conforming yet widespread code, like

#define offsetof(st, m) ((size_t)(&((st *)0)->m))

There was even a discussion to make this behaviour part of the standard.


Because you cannot create a null reference. C++ doesn't allow it. Therefore you cannot dereference a null pointer.

Mainly it is undefined because there is no logical way to handle it.


You can actually dereference a null pointer. Someone did it here: http://www.codeproject.com/KB/system/soviet_kernel_hack.aspx

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜