开发者

Need Help Tracking Down EXC_BAD_ACCESS on Function Entry on MacOS

I have a program that gets a KERN_PROTECTION_FAILURE with EXC_BAD_ACCESS in a very strange place when running multithreaded and I haven't the faintest idea how to troubleshoot it further. This is on MacOS 10.6 using GCC.

The very strange place that it gets this is when entering a function. Not on the first line of the function, but the actual jump to the function GetMachineFactors():

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0xb00009ec
[Switching to process 28242]
0x00012592 in GetMachineFactors () at ../sysinfo/OSX.cpp:168
168 MachineFactors* GetMachineFactors()
(gdb) bt
#0  0x00012592 in GetMachineFactors () at ../sysinfo/OSX.cpp:168
#1  0x000156d0 in CollectMachineFactorsThreadProc (parameter=0x200280) at Threads.cpp:341
#2  0x952f681d in _pthread_start ()
#3  0x952f66a2 in thread_start ()
(gdb) 

If I run this non-threaded, it runs great, no issues:

#include "MachineFactors.h"

int main( int argc, char** argv )
{
    MachineFactors* factors = GetMachineFactors();
    std::string str = CreateJSONObject(factors);
    cout << str;
    delete factors;
    return 0;
}

If I run this in a pthread, I get the EXC_BAD_ACCESS above.

THREAD_FUNCTION CollectMachineFactorsThreadProc( LPVOID parameter )
{
    Main* client = (Main*) parameter;
    if ( parameter == NULL )
    {
        ERRORLOG( "No data passed to machine identification thread.  Aborting." );
        return 0;
    }
    MachineFactors* mfactors = GetMachineFactors(); // This is where it dies.
    // If I don't call GetMachineFactors and do something like mfactors =
    // new MachineFactors(); everything is good and the threads communicate and exit
    // normally.
    if (mfactors == NULL)
    {
        ERRORLOG("Failed to collect machine identification: GetMachineFactors returned NULL." << endl)
        return 0;
    }
    client->machineFactors = CreateJSONObject(mfactors);
    delete mfactors;
    EVENT_RAISE(client->machineFactorsEvent);
    return 0;
}

Here is an excerpt from the GetMachineFactors() code:

MachineFactors* GetMachineFactors() // Dies on this line in multi-threaded.
{
    // printf( "Getting machine factors.\n"); // Tried with and without this, never prints.
    factors = new MachineFactors();
    factors->OSName = "MacOS";
    factors->Manufacturer = "Apple";
    ///…
    // gather various machine metrics here.
    //…
    return factors;
}

For reference, I am using a socketpair to wait on the thread to complete:

// From the header file I use for cross-platform defines (this runs on OSX, Windows, and Linux.
struct _waitt
{
  int fds[2];
};
#define THREAD_FUNCTION void*
#define THREAD_REFERENCE pthread_t
#define MUTEX_REFERENCE pthread_mutex_t*
#define MUTEX_LOCK(m) pthread_mutex_lock(m)
#define MUTEX_UNLOCK pthread_mutex_unlock
#define EVENT_REFERENCE struct _waitt
#define EVENT_WAIT(m) do { char lc; if (read(m.fds[0], &lc, 1)) {} } while (0)
#define EVENT_RAISE(m) do { char lc = 'j'; if (write(m.fds[1], &lc, 1)) {} } while (0)
#define EVENT_NULL(m) do { m.fds[0] = -1; m.fds[1] = -1; } while (0)

Here is the code where I launch the thread.

void Main::CollectMachineFactors()
{
#ifdef WIN32
    machineFactorsThread = CreateThread(NULL, 0, CollectMachineFactorsThreadProc, this, 0, 0);
    if ( machineFactorsThread == NULL )
    {
        ERRORLOG( "Could not create thread for machine id: " << ERROR_NO << endl )
    }
#else
    int retval = pthread_create(&machineFactorsThread, NULL, CollectMachineFactorsThreadProc, this);
    if (retval)
    {
        ERRORLOG( "Return code from machine id pthread_create() is " << retval << endl )
    }
#endif
}

Here's the simple failure case of running this multithreaded. It always fails for this code with the stack trace above:

CollectMachineFactors();
EVENT_WAIT(machineFactorsEvent);
cout << machineFactors;
return 0;

At first I suspected a library problem. Here's my makefile:

# Main executable file
PROGRAM = sysi开发者_如何转开发nfo
# Object files
OBJECTS = Version.h Main.o Protocol.o Socket.o SSLConnection.o Stats.o TimeElapsed.o Formatter.o OSX.o Threads.o
# Include directories
INCLUDE = -Itaocrypt/include -IyaSSL/taocrypt/mySTL -IyaSSL/include -isysroot /Developer/SDKs/MacOSX10.5.sdk -mmacosx-version-min=10.5
# Library settings
STATICLIBS = libtaocrypt.a libyassl.a -Wl,-rpath,. -ldl -lpthread -lz -lexpat
# Compile settings
RELCXX = g++ -g -ggdb -DDEBUG -Wall $(INCLUDE)

.SUFFIXES:      .o .cpp

.cpp.o :
        $(RELCXX) -c -Wall $(INCLUDE) -o $@ $<

all:    $(PROGRAM)

$(PROGRAM):     $(OBJECTS)
        $(RELCXX) -o $(PROGRAM) $(OBJECTS) $(STATICLIBS)

clean: 
    rm -f *.o $(PROGRAM)

I can't for the life of me see anything particularly odd or dangerous and I'm not sure where to look. The same threaded process works fine on any Linux machine I have tried. Any suggestions? Any tools I should try?

I can add more info if it would be helpful.


I can see a problem with your Windows code, but not the OSX code that's crashing on you.

It seems that you're not posting the actual code for GetMachineFactors, since the variable factors is not declared. But regarding debugging, you should not take the non-appearance of printf output as conclusive that that statement hasn't been executed. Use debugger facilities such as setting a breakpoint, using special debugger trace output, so on (not sure what gdb handles, it's a very primitive debugger, but perhaps Apple has better tools?).

For Windows, you should use the run time library's thread creation instead of Windows API CreateThread. That's because with CreateThread the runtime lib isn't informed. E.g, a new expression or other call that uses the runtime lib might fail.

Sorry I can't help more.

I think it could perhaps have something to do with the GetMachineFactors code that you haven't shown?


It turns out, and I can't explain why, that a fork() call combined with a socketpair() as the IPC mechanism was the workaround to get things going as intended.

I wish I knew why it was failing in the first place (headscratch) but that approach seems to have been a good workaround.

It almost seemed like the kind of "build out of whack" problem that could be caused by failing to run a 'make clean' after changing header files, but that wasn't the case here.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜