开发者

Weird seg fault problem

Greetings,

I'm having a weird seg fault problem. My application dumps a core file at runtime. After digging into it I found it died in this block:

#include <lib1/c.h>  
...  
x::c obj;  
obj.func1();  

I defined class c in a library lib1:

namespace x  
{  
    struct c  
    {  
        c();  
        ~c();  
        void fun1();  
        vector<char *> _data;  
    };  
}  

x::c::c()  
{  
}  

x::c::~c()  
{  
    for ( int i = 0; i < _data.size(); ++i )  
        delete _data[i];  
}  

I could not figure it out for some time till I ran nm on the lib1.so file: there are more function definitions than I defined:

x::c::c()  
x::c::c()  
x::c::~c()  
x::c::~c()  
x::c::func1()  
x:开发者_如何学Go:c::func2()  

After searching in code base I found someone else defined a class with same name in same namespace, but in another library lib2 as follows:

namespace x  
{  
    struct c  
    {  
       c();  
       ~c();  
       void func2();  
       vector<string> strs_;  
    };  
}  

x::c::c()
{
}

x::c::~c()
{
}

My application links to lib2, which has dependency on lib1. This interesting behavior brings several questions:

  1. Why would it even work? I would expect a "multiple definitions" error while linking against lib2 (which depends upon lib1) but never had such. The application seems to be doing what's defined in func1 except it dumps a core at runtime.

  2. After attaching debugger, I found my application calls the ctor of class c in lib2, then calls func1 (defined in lib1). When going out of scope it calls dtor of class c in lib2, where the seg fault occurs. Can anybody teach me how this could even occur?

  3. How can I prevent such problems from happening again? Is there any C++ syntax I can use?

Forgot to mention I'm using g++ 4.1 on RHEL4, thank you very much!


1.

Violations of the "one definition rule" don't have to be diagnosed by your compiler. In fact, they are often only going to be known at link time when you link multiple object files together.

At link time, the information about the original class definitions may not exist any more (they are not needed after the compiler step) so having multiple definitions of a class is typically not easy to flag to the user.

2.

Once you have two distinct definitions pretty much anything can happen, you are in the territory of undefined behaviour. Whatever happens, it's a possible outcome.

3.

The most sensible thing to do is to communicate with the other members of your team. Agree who's going to use which namespaces and you won't get these problems. Otherwise, you point a documentation tool or static analysis tool over your entire project. Many such tools will be able to diagnose multiple inconsistent definitions of classes.


Just a guess but I don't see any using namespace x; so perhaps it used one namespace instead of the other?


With the advent of templates it became necessary to allow multiple definitions of a body of code with the same name; there was no way for the compiler to know if the same template code had already been generated in another compilation unit i.e. source file. When the linker finds these duplicates, it assumes they are identical. The burden is on you to make sure that they are - this is called the One Definition Rule.


On the linker level this is library interpositioning. The effective symbol bound unfortunately depends on the order of object files on linker command line (this is, sigh, historical).

From what you describe it looks that lib1 comes first in linker argument list and lib2 comes second and interposes on symbols from lib1. This explains the calls to constructors and destructors from the lib2 but calls to func1 from lib1 (since there's no func1-derived symbol in lib2, so there's no "hiding", the call is bound to lib1.)

The solution to this particular problem is to reverse the order of libraries on the linker invocation command.


There's lots of answers about the one definition rule. However, to me, this looks a lot more like a missing copy constructor.

To elaborate:

If the copy constructor is called on your object, then you will get a memory leak. This is because delete will be called on the same set of pointers twice.

namespace x  
{  
    struct c  
    {
        c() {
        }

        ~c() {
            for ( int i = 0; i < _data.size(); ++i )  
                delete _data[i];
        }

        c(const c & rhs) {
            for (int i=0; i< rhs.size(); ++i) {
                int len = strlen(rhs[i]);
                char *mem = malloc(len + 1); 
                strncpy(mem, rhs[i], len + 1);
                _data.push_back(mem);
        }

        void fun1();  
        vector<char *> _data;  
    };  
}  
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜