开发者

Program compiled with GCC 4.5 crashes, while GCC 4.4 is fine

Recently I tried to compile and install ns-2, a network simulator based on C++ and Tcl.

Using some slight modification of the source code (don't worry, it won't cause the crash), I could make it compile using the latest gcc 4.5 version.

But when I execute the binary, it's giving the following error.:

$bin/ns
*** buffer overflow detected ***: bin/ns terminated

The same code if compiled with earlier gcc, runs fine. So I believe it's due to some enhanced features in gcc 4.5.

How do I approach this problem? Of course compiling with gcc 4.4 is an option, but I would like to know what went wrong :)

Update:

Here is the full stack-trace and back-trace with gdb:

$ bin/ns
*** buffer overflow detected ***: bin/ns terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7f01824ac1d7]
/lib/x86_64-linux-gnu/libc.so.6(+0xfd0f0)[0x7f01824ab0f0]
bin/ns[0x8d5b5a]
bin/ns[0x8d56de]
bin/ns[0x841077]
bin/ns[0x842b19]
bin/ns(Tcl_EvalEx+0x16)[0x843256]
bin/ns(Tcl_Eval+0x1d)[0x84327d]
bin/ns(Tcl_GlobalEval+0x2b)[0x84391b]
bin/ns(_ZN3Tcl4evalEPc+0x27)[0x83352b]
bin/ns(_ZN3Tcl5evalcEPKc+0xdd)[0x8334e9]
bin/ns(_ZN11EmbeddedTcl4loadEv+0x24)[0x834712]
bin/ns(Tcl_AppInit+0xb2)[0x8331a5]
bin/ns(Tcl_Main+0x1d0)[0x8ad6a0]
bin/ns(nslibmain+0x25)[0x8330c5]
bin/ns(main+0x20)[0x833254]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xff)[0x7f01823cceff]
bin/ns[0x5bc1a9]

Using GDB and with symbols turned on:

(gdb) bt
#0  0x00007ffff6970d05 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff6974ab6 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff69a9d7b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff6a3b1d7 in __fortify_fail () from /lib/x86_64-linux-g开发者_StackOverflow社区nu/libc.so.6
#4  0x00007ffff6a3a0f0 in __chk_fail () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00000000008d5b5a in strcpy (interp=0xd2dda0, optionIndex=<value optimized out>, objc=<value optimized out>, objv=0x7fffffffdad0)
    at /usr/include/bits/string3.h:105
#6  TraceVariableObjCmd (interp=0xd2dda0, optionIndex=<value optimized out>, objc=<value optimized out>, objv=0x7fffffffdad0)
    at /media/Linux/ns-allinone-2.35-RC7/tcl8.5.8/unix/../generic/tclTrace.c:912
#7  0x00000000008d56de in Tcl_TraceObjCmd (dummy=<value optimized out>, interp=0xd2dda0, objc=<value optimized out>, objv=0xd2ec00)
    at /media/Linux/ns-allinone-2.35-RC7/tcl8.5.8/unix/../generic/tclTrace.c:293
#8  0x0000000000841077 in TclEvalObjvInternal (interp=0xd2dda0, objc=5, objv=0xd2ec00,
    command=0x7ffff7f680fe "trace variable defaultRNG w { abort \"cannot update defaultRNG once assigned\"; }\n\n\nClass RandomVariable/TraceDriven -superclass RandomVariable\n\nRandomVariable/TraceDriven instproc init {} {\n$self instv"..., length=80, flags=0)
    at /media/Linux/ns-allinone-2.35-RC7/tcl8.5.8/unix/../generic/tclBasic.c:3689
#9  0x0000000000842b19 in TclEvalEx (interp=0xd2dda0,
    script=0x7ffff7f52010 "\n\n\n\n\n\nproc warn {msg} {\nglobal warned_\nif {![info exists warned_($msg)]} {\nputs stderr \"warning: $msg\"\nset warned_($msg) 1\n}\n}\n\nif {[info commands debug] == \"\"} {\nproc debug args {\nwarn {Script debugg"..., numBytes=422209, flags=<value optimized out>, line=4141,
    clNextOuter=<value optimized out>,
    outerScript=0x7ffff7f52010 "\n\n\n\n\n\nproc warn {msg} {\nglobal warned_\nif {![info exists warned_($msg)]} {\nputs stderr \"warning: $msg\"\nset warned_($msg) 1\n}\n}\n\nif {[info commands debug] == \"\"} {\nproc debug args {\nwarn {Script debugg"...)
    at /media/Linux/ns-allinone-2.35-RC7/tcl8.5.8/unix/../generic/tclBasic.c:4386
#10 0x0000000000843256 in Tcl_EvalEx (interp=<value optimized out>, script=<value optimized out>, numBytes=<value optimized out>,
    flags=<value optimized out>) at /media/Linux/ns-allinone-2.35-RC7/tcl8.5.8/unix/../generic/tclBasic.c:4043
#11 0x000000000084327d in Tcl_Eval (interp=0xd2dda0, script=<value optimized out>)
    at /media/Linux/ns-allinone-2.35-RC7/tcl8.5.8/unix/../generic/tclBasic.c:4955
#12 0x000000000084391b in Tcl_GlobalEval (interp=0xd2dda0, command=<value optimized out>)
    at /media/Linux/ns-allinone-2.35-RC7/tcl8.5.8/unix/../generic/tclBasic.c:6005
#13 0x000000000083352b in Tcl::eval(char*) ()
#14 0x00000000008334e9 in Tcl::evalc(char const*) ()
#15 0x0000000000834712 in EmbeddedTcl::load() ()
#16 0x00000000008331a5 in Tcl_AppInit ()
#17 0x00000000008ad6a0 in Tcl_Main (argc=<value optimized out>, argv=0x7fffffffe1d0, appInitProc=0x8330f3 <Tcl_AppInit>)
    at /media/Linux/ns-allinone-2.35-RC7/tcl8.5.8/unix/../generic/tclMain.c:418
#18 0x00000000008330c5 in nslibmain ()
#19 0x0000000000833254 in main ()    


Famous last words: "Don't worry - my change didn't break anything". How can we be sure of that?

However, there is a moderate chance you're correct if the code worked under 4.4 and crashes under 4.5.

GCC has adopted some aggressive optimizations related to code that tries to detect integer overflow and removes it. In which case, you're going to have to find that code in ns-2 and try to get it fixed - either by the ns-2 developers or on your own.

You should probably try to run the program under the debugger so that you can get control at the point where the buffer overflow is detected, and see where the code is. If you disabled core dumps (with ulimit -c 0 or equivalent), consider enabling them and see whether you get a core dump when it terminates. That should give you a starting point.


Further thoughts:

  • When you compiled the code, how stringent were the warning flags used? Can you recompile with more warnings enabled?

    One technique that often works (with AutoTools-configured programs) if you can find no other way to get special options to the C or C++ compiler is:

    ./configure --prefix=/opt/ns CC="gcc -Wall -Wextra" CXX="g++ -Wall -Wextra"
    

    (I also use this technique to specify 32-bit vs 64-bit builds, adding -m32 or -m64.)

    Warning: if the code was not created to compile clean under these options, it can be traumatic to do the first compilation using these options. However, there is also a decent chance that in amongst all the warnings is one about the source of your problem. However, it is also indisputable that there will likely be 50 warnings not related to it to any 1 that is (or worse), and fixing all the warnings thus spotted still might not cure the problem. If the code compiles with stringent warnings anyway, then you are faced with enabling many more exotic warnings instead. But if you can get the compiler to help diagnose the problem that it is causing, you should certainly do so - it is much simpler than finding the problem unaided.

  • Also, make sure you are producing a debuggable program - even if you keep the optimization enabled.

  • Also, consider compiling with optimization off and see whether the program still crashes. If the program does not crash without optimization and does with optimization, you have some useful information. It won't make it easier to find the cause, but you know it is (probably) related to the optimizer. Or it might just be that the bug moves when not optimized and doesn't fail fatally.


The extended stack trace information is curious:

#5  0x00000000008d5b5a in strcpy (interp=0xd2dda0, optionIndex=<value optimized out>,
                                  objc=<value optimized out>, objv=0x7fffffffdad0)
    at /usr/include/bits/string3.h:105
#6  TraceVariableObjCmd (interp=0xd2dda0, optionIndex=<value optimized out>,
                         objc=<value optimized out>, objv=0x7fffffffdad0)
    at /media/Linux/ns-allinone-2.35-RC7/tcl8.5.8/unix/../generic/tclTrace.c:912

Those are not the ordinary arguments to strcpy(). Usually, you have just two arguments. I can't immediately think of a circumstance where it would appropriate to copy a string over the pointer to the Tcl interpreter's main control structure. So, to get further with this, I would be looking extremely hard at lines 900-920 or so in tclTrace.c, and in particular, line 912. This might just be an artefact of the way the optimizer is mungeing the object code, or it might be a genuine problem.

I found the tcl8.5.8 source and line 912 of tclTrace.c is the strcpy() in this code:

    if ((enum traceOptions) optionIndex == TRACE_ADD) {
        CombinedTraceVarInfo *ctvarPtr;

        ctvarPtr = (CombinedTraceVarInfo *) ckalloc((unsigned)
                (sizeof(CombinedTraceVarInfo) + length + 1
                - sizeof(ctvarPtr->traceCmdInfo.command)));
        ctvarPtr->traceCmdInfo.flags = flags;
        if (objv[0] == NULL) {
            ctvarPtr->traceCmdInfo.flags |= TCL_TRACE_OLD_STYLE;
        }
        ctvarPtr->traceCmdInfo.length = length;
        flags |= TCL_TRACE_UNSETS | TCL_TRACE_RESULT_OBJECT;
        strcpy(ctvarPtr->traceCmdInfo.command, command);       // Line 912
        ctvarPtr->traceInfo.traceProc = TraceVarProc;
        ctvarPtr->traceInfo.clientData = (ClientData)
                &ctvarPtr->traceCmdInfo;
        ctvarPtr->traceInfo.flags = flags;
        name = Tcl_GetString(objv[3]);
        if (TraceVarEx(interp,name,NULL,(VarTrace*)ctvarPtr) != TCL_OK) {
            ckfree((char *) ctvarPtr);
            return TCL_ERROR;
        }
    } else {

So, the output from GDB and the stack trace looks somewhat misleading; there are two variables passed to strcpy() and one of those is locally allocated on the heap.

I would think about compiling tcl standalone from the source embedded with ns-2 and see whether you can tickle the bug (sorry, awful pun) on its own. This code is related to tracing a tcl variable - trace add varname ... AFAICT.

Assuming that passes, I'd consider getting hold of GCC 4.6 and seeing whether the same problem occurs when you compile ns-2 with that instead of GCC 4.5.


Valgrind

Since you are running on Linux, you should be able to use Valgrind. It is excellent at spotting memory abuse problems. For maximum benefit, use a debug build of ns-2.


"buffer overflow detected": you are writing to a zone which wasn't allocated. gcc 4.4 apparently generated code which didn't trigger a problem (or had a problem which didn't revealed itself as a crash but just as wrong results undetected now as such), gcc 4.5 generate code which detect the problem and warn you about it. The only solution is to find the source of the problem and fix the code.


It could be all sorts of thing. It could be a GCC bug. It could be a Tcl bug (I hope it isn't, speaking as one of the Tcl developers, but I won't rule it out as Tcl quite often assumes that there's no guard code on structures; Tcl is definitely C89 code). It could be a bug in ns2. For all I know, it could even be a bug elsewhere (because ns2 is built on Tcl, it can load external code libraries; it's quite possible to have a problem there).

Alas, we can't tell from the information posted which of those possibilities it is. Do you know in which library the callstack was when the program crashed? While not a guarantee that that's the actual locus of the problem, it's at least a place to start the bug-hunt…

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜