pthread_mutex_unlock not atomic
I've the following source code (adapted from my original code):
#include "stdafx.h"
#include <stdlib.h>
#include <stdio.h>
#include "pthread.h"
#define MAX_ENTRY_COUNT 4
int entries = 0;
bool start = false;
bool send_active = false;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t condNotEmpty = PTHREAD_COND_INITIALIZER;
pthread_cond_t condNotFull = PTHREAD_COND_INITIALIZER;
void send()
{
for (;;) {
if (!start)
continue;
start = false;
for(int i = 0; i < 11; ++i) {
send_active = true;
pthread_mutex_lock(&mutex);
while(entries == MAX_ENTRY_COUNT)
pthread_cond_wait(&condNotFull, &mutex);
entries++;
pthread_cond_broadcast(&condNotEmpty);
pthread_mutex_unlock(&mutex);
send_active = false;
}
}
}
void receive(){
for(int i = 0; i < 11; ++i){
pthread_mutex_lock(&mutex);
while(entries == 0)
pthread_cond_wait(&condNotEmpty, &mutex);
entries--;
pthread_cond_broadcast(&condNotFull);
pthread_mutex_unlock(&mutex);
}
if (send_active)
printf("x");
}
int _tmain(int argc, _TCHAR* argv[])
{
pthread_t s;
pthread_create(&s, NULL, (void *(*)(void*))send, NULL);
for (;;) {
pthread_mutex_init(&mutex, NULL);
pthread_cond_init(&condNotEmpty, NULL);
pthread_cond_init(&condNotFull, NULL);
start = true;
receive();
pthread_mutex_destroy(&mutex);
mutex = NULL;
pthread_cond_destroy(&condNotEmpty);
pthread_cond_destroy(&condNotFull);
condNotEmpty = NULL;
condNotFull = NULL;
printf(".");
}
return 0;
}
The problem is like follows: from time to time the last unlock in the send function is not finished before the receive method continues. In my original code the mutexes are located in an objects which are deleted after doing the job. If the send method has not finished with the last unlock then the mutexes are invalid and my program causes failures in unlock.
The behavior can be easily reproduced by running the program: each time the "x" is diplayed the receive method has nearly finished and the send method "hangs" in the unlock call.
I've compiled with VS2008 and VS2010 - both results are same.
pthread_mutex_unlock is not atomic, this would solve the problem. How can I solve this issue? Any comments are welcome...
Best regards
开发者_如何转开发Michael
Your printf("x") is a textbook race condition example.
After pthread_mutex_unlock() OS is free to not schedule this thread for any amount of time: ticks, seconds or days. You can't assume that send_active will be "falsified" in time.
pthread_mutex_unlock()
must by definition release the mutex before it returns. The instant the mutex is released, another thread that's contending for the mutex could be scheduled. Note that even if pthread_mutex_unlock()
could arrange to not release the mutex until just after it returned (what I think you mean by it being atomic), there would still be an equivalent race condition to whatever you're seeing now (it's not clear to me what race you're seeing since a comment indicates that ou're not realy interested in the race condition of accessing send_active
to control the printf()
call).
In that case the other thread could be scheduled 'between-the-lines' of the pthread_mutex_unlock()
and the following statement/expression in the function that called it - you'd have the same race condition.
Here's some speculation on what might be happening. A couple caveats on this analysis:
- this is based on the assumption that you're using the Win32 pthreads package from http://sourceware.org/pthreads-win32/
- this is based only on a pretty brief examination of the pthreads source on that site and the information in the question and comments here - I have not had an opportunity to actually try to run/debug any of the code.
When pthread_mutex_unlock()
is called, it decrements a lock count, and if that lock count drops to zero, the Win32 SetEvent()
API is called on an associated event object to let any threads waiting on the mutex to be unblocked. Pretty standard stuff.
Here's where the speculation comes in. Lets say that SetEvent()
has been called to unblock thread(s) waiting on the mutex, and it signals the event associated with the handle it was given (as it should). However, before the SetEvent()
function does anything else, another thread starts running and closes the event object handle that that particular SetEvent()
was called with (by calling pthread_mutex_destroy()
).
Now the SetEvent()
call in progress has a handle value that's no longer valid. I can't think of a particular reason that SetEvent()
would do anything with that handle after it's signaled the event, but maybe it does (I could also imagine someone making a reasonable argument that SetEvent()
should be able to expect that the event object handle remain valid for the duration of the API call).
If this is what's happening to you (and that's a big if), I'm not sure if there's an easy fix. I think that the pthreads library would have to have changes made so that it duplicated the event handle before calling SetEvent()
then close that duplicate when the SetEvent()
call returned. That way the handle would remain valid even if the 'main' handle were closed by another thread. I'm guessing it would have to do this in a number of places. This could be implemented by replacing the affected Win32 API calls with calls to wrapper functions that perform the "duplicate handle/call API/close duplicate" sequence.
It might not be unreasonable for you to try to make this change for the SetEvent()
call in pthread_mutex_unlock()
and see if it solves (or at least improves) your particular problem. If so, you may want to contact the maintainer of the library to see if a more comprehensive fix might be arranged somehow (be prepared - you may be asked to do a significant part of the work).
Out of curiosity, in your debugging of the state of the thread that hangs in pthread_mutex_unlock()
/SetEvent()
, do you have any information on exactly whats happening? What SetEvent()
is waiting for? (the cdb debugger that's in the Debugging Tools for Windows package might be able to give you more information about this than the Visual Studio debugger).
Also, note the following comment in the source for pthread_mutex_destroy()
which seems related (but different) than your particular problem:
/*
* FIXME!!!
* The mutex isn't held by another thread but we could still
* be too late invalidating the mutex below since another thread
* may already have entered mutex_lock and the check for a valid
* *mutex != NULL.
*
* Note that this would be an unusual situation because it is not
* common that mutexes are destroyed while they are still in
* use by other threads.
*/
Michael, thank you for your investigation and comments!
The code I'm using is from http://sourceware.org/pthreads-win32/.
The situation described by you in the third and fourth paragraph is exactly what's happening.
I've checked some solutions and a simple one seems to work for me: I waiting till the send function (and SetEvent) has finished. All my tests with this solution were successful till now. I'm going to do a larger test over the weekend.
精彩评论