开发者

How to explain undefined behavior to know-it-all newbies?

There'a a handful of situations that the C++ standard attributes as undefined behavior. For example if I allocate with new[], then try to free with delete (not delete[]) that's undefined behavior - anything can happen - it might work, it might crash nastily, it might corrupt something silently and plant a timed problem.

It's so problematic to explain this anything can happen part开发者_JAVA百科 to newbies. They start "proving" that "this works" (because it really works on the C++ implementation they use) and ask "what could possibly be wrong with this"? What concise explanation could I give that would motivate them to just not write such code?


Undefined means explicitly unreliable. Software should be reliable. You shouldn't have to say much else.

A frozen pond is a good example of an undefined walking surface. Just because you make it across once doesn't mean you should add the shortcut to your paper route, especially if you're planning for the four seasons.


Two possibilities come to my mind:

  1. You could ask them "just because you can drive on the motorway the opposite direction at midnight and survive, would you do it regularly?"

  2. The more involved solution might be to set up a different compiler / run environment to show them how it fails spectacularly under different circumstances.


"Congratulations, you've defined the behavior that compiler has for that operation. I'll expect the report on the behavior that the other 200 compilers that exist in the world exhibit to be on my desk by 10 AM tomorrow. Don't disappoint me now, your future looks promising!"


Simply quote from the standard. If they can't accept that, they aren't C++ programmers. Would Christians deny the bible? ;-)

1.9 Program execution

  1. The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. [...]

  2. Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int)). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects. [...]

  3. Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine. [...]

  4. Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [ Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior. —end note ]

You can't get any clearer than that.


I'd explain that if they didn't write the code correctly, their next performance review would not be a happy one. That's sufficient "motivation" for most people.


Let them try their way until their code will crash during test. Then the words won't be needed.

The thing is that newbies (we've all been there) have some amount of ego and self-confidence. It's okay. In fact, you couldn't be a programmer if you didn't. It's important to educate them but no less important to support them and don't cut their start in the journey by undermining their trust in themselves. Just be polite but prove your position with facts not with words. Only facts and evidence will work.


John Woods:

In short, you can't use sizeof() on a structure whose elements haven't been defined, and if you do, demons may fly out of your nose.

"Demons may fly out of your nose" simply must be part of the vocabulary of every programmer.

More to the point, talk about portability. Explain how programs frequently have to be ported to different OSes, let alone different compilers. In the real world, the ports are usually done by people other than the original programmers. Some of these ports are even to embedded devices, where there can be enormous costs of discovering that the compiler decided differently from your assumption.


Turn the person into a pointer. Tell them that they are a pointer to a class human and you are invoking the function 'RemoveCoat'. When they are pointing at a person and saying 'RemoveCoat' all is fine. If the person does not have a coat, no worries - we check for that, all RemoveCoat really does is remove the top layer of clothing (with decency checks).

Now what happens if they are pointing somewhere random and they say RemoveCoat - if they are pointing at a wall then the paint might peel off, if they are pointing at a tree the bark might come off, dogs might shave themselves, the USS Enterprise might lower its shields at a critical moment etc!

There is no way of working out what might happen the behaviour has not been defined for that situation - this is called undefined behaviour and must be avoided.


Quietly override new, new[], delete and delete[] and see how long it takes him to notice ;)

Failing that ... just tell him he is wrong and point him towards the C++ spec. Oh yeah .. and next time be more careful when employing people to make sure you avoid a-holes!


One would be...

"This" usage is not part of the language. If we would say that in this case the compiler must generate code that crashes, then it would be a feature, some kind of requirement for the compiler's manufacturer. The writers of the standard did not wanted to give unnecessary work on "features" that are not supported. They decided not to make any behavioral requirements in such cases.


I like this quote:

Undefined behavior: it may corrupt your files, format your disk or send hate mail to your boss.

I don't know who to attribute this to (maybe it's from Effective C++)?


C++ is not really a language for dilletantes, and simply listing out some rules and making them obey without question will make for some terrible programmers; most of the stupidest things I see people say are probably related to this kind of blind rules following/lawyering.

On the other hand if they know the destructors won't get called, and possibly some other problems, then they will take care to avoid it. And more importantly, have some chance to debug it if they ever do it by accident, and also to have some chance to realize how dangerous many of the features of C++ can be.

Since there's many things to worry about, no single course or book is ever going to make someone master C++ or probably even become that good with it.


Just show them Valgrind.


Compile and run this program:

#include <iostream>

class A {
    public:
            A() { std::cout << "hi" << std::endl; }
            ~A() { std::cout << "bye" << std::endl; }
};

int main() {
    A* a1 = new A[10];
    delete a1;

    A* a2 = new A[10];
    delete[] a2;
}

At least when using GCC, it shows that the destructor only gets called for one of the elements when doing single delete.

About single delete on POD arrays. Point them to a C++ FAQ or have them run their code through cppcheck.


One point not yet mentioned about undefined behavior is that if performing some operation would result in undefined behavior, a standards-conforming implementation could legitimately, perhaps in an effort to be 'helpful' or improve efficiency, generate code which would fail if such an operation were attempted. For example, one can imagine a multi-processor architecture in which any memory location may be locked, and attempting to access a locked location (except to unlock it) will stall until such time as the location in question was unlocked. If the locking and unlocking were very cheap (plausible if they're implemented in hardware) such an architecture could be handy in some multi-threading scenarios, since implementing x++ as (atomically read and lock x; add one to read value; atomically unlock and write x) would ensure that if two threads both performed x++ simultaneously, the result would be to add two to x. Provided programs are written to avoid undefined behavior, such an architecture might ease the design of reliable multi-threaded code without requiring big clunky memory barriers. Unfortunately, a statement like *x++ = *y++; could cause deadlock if x and y were both references to the same storage location and the compiler attempted to pipeline the code as t1 = read-and-lock x; t2 = read-and-lock y; read t3=*t1; write *t2=t3; t1++; t2++; unlock-and-write x=t1; write-and-unlock y=t2;. While the compiler could avoid deadlock by refraining from interleaving the various operations, doing so might impede efficiency.


Turn on malloc_debug and delete an array of objects with destructors. freeing a pointer inside the block should fail. Call them all together and demonstrate this.

You'll need to think of other examples to build your credibility until they understand that they are newbies and there's a lot to know about C++.


Tell them about standards and how tools are developed to comply with the standards. Anything outside the standard might or might not work, which is UB.


Just because their program appears to work is a guarantee of nothing; the compiler could generate code that happens to work (how do you even define "work" when the correct behavior is undefined?) on weekdays but formats your disk on weekends. Did they read the source code to their compiler? Examine their disassembled output?

Or remind them just because it happens to "work" today is no guarantee of it working when you upgrade your compiler version. Tell them to have fun finding whatever subtle bugs creep up from that.

And really, why not? They should be providing a justifiable argument to use undefined behavior, not the other way around. What reason is there to use delete instead of delete[] other than laziness? (Okay, there's std::auto_ptr. But if you're using std::auto_ptr with a new[]-allocated array, you probably ought to be using a std::vector anyway.)


Both the C and C++ Standards use the term "Undefined Behavior" to refer to situations in which it may be useful for different implementations to process constructs in differing, incompatible, fashions, some of which will behave predictably but some of which may not. Both use the same terminology to describe UB, and while I don't know of any published Rationale for C++ Standards, the Rationale for the C Standard says:

Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior."

Note that many actions which were classified as Undefined Behavior by the C Standard were considered fully defined on many if not all implementations, but the authors of the Standard wanted to give implementors targeting unusual platforms or application fields the ability to deviate from the normal behaviors if doing so would benefit their customers. Such freedom was not intended to invite arbitrary and capricious deviations from precedent that make it harder for programmers to quickly and easily do what needed to be done.

Unfortunately, many programmers who use gcc and clang don't understand their needs as well as the maintainers of those compilers, who recognize that that since the Standard avoids mandating anything that would impair the efficiency of applications that will never receive maliciously-crafted inputs, or will only run in contexts where even malicious programs would be unable to damage anything, that implies that there's no need for any implementations to allow programmers to easily and efficiently write programs that are suitable for use in other contexts.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜