pthread_create followed by pthread_detach still results in possibly lost error in Valgrind
I'm having a problem with Valgrind telling me I have some memory possible lost:
==23205== 544 bytes in 2 blocks are possibly lost in loss record 156 of 265
==23205== at 0x6022879: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23205== by 0x540E209: allocate_dtv (in /lib/ld-2.12.1.so)
==23205== by 0x540E91D: _dl_allocate_tls (in /lib/ld-2.12.1.so)
==23205== by 0x623068D: pthread_create@@GLIBC_2.2.5 (in /lib/libpthread-2.12.1.so)
==23205== by 0x758D66: MTPCreateThreadPool (MTP.c:290)
==23205== by 0x405787: main (MServer.c:317)
The code that creates these threads (MTPCreateThreadPool) basically gets an index into a block of waiting pthread_t slots, and creates a thread with that. TI becomes a pointer to a struct that has a thread index and a pthread_t. (simplified/sanitized):
for (tindex = 0; tindex < NumThreads; tindex++)
{
int rc;
TI = &TP->ThreadInfo[tindex];
TI->ThreadID = tindex;
rc = pthread_create(&TI->ThreadHandle,NULL,MTPHandleRequestsLoop,TI);
/* check for non-success that I've omitted */
pthread_detach(&TI->ThreadHandle);
}
Then we have a function MTPDestroyThreadPool that loops through all the threads we created and cancels them (since the MTPHandleRequestsLoop doesn't exit).
for (tindex = 0; tindex < NumThreads; tindex++)
{
pthread_cancel(TP->ThreadInfo[tindex].ThreadHandle);
}
I've read elsewhere (including other questions here on SO) that detaching a thre开发者_如何学JAVAad explicitly would prevent this possibly lost error, but it clearly isn't. Any thoughts?
glibc's threads implementation intentionally leaks memory. It keeps the memory allocated to a thread context cached to reuse the next time a thread is created. I did some benchmarking versus an implementation without the caching, and it seems the caching shaves 50% off the otherwise-optimal time for pthread_create
, but drastically slows down pthread_join
, for a net loss. Of course it's still a (small) gain if what you care about is thread creation latency and not throughput.
Also note that it's very difficult for a detached thread to deallocate its context, even if it wanted to. With a joinable thread, the thread that calls pthread_join
can deallocate the context, but a detached thread would have to be able to operate with no stack during the interval between deallocating its context and terminating itself. This can only be achieved by writing that small piece of code in pure asm.
Wondering how detached threads' contexts get returned to the cache without a similar race condition? Linux has a feature to zero-out an int
at a particular address (registered by the userspace threads library) when the thread terminates. So the thread can add its own context to the cache safely, since until it terminates, other threads will still see a nonzero value (usually its thread-id) at this address and interpret that to mean that the context is still in use.
One reason could be that pthread_cancel doesn't actually cancel the thread - it's not guaranteed to. Thread cancellation is asynchronous; pthread_cancel returns immediately, but cancellation may be deferred until the next cancellation point. In that case, the threads may still be around when Valgrind collects statistics.
Creating a thread joinable to detach it immediately makes not much sense to me. This creates nothing but overhead for the system.
I'd start the threads detached from the start, you have your shared ThreadInfo
data structure anyhow to control your threads.
Also I'd have a flag or something like that in your per thread argument ThreadInfo
that tells the thread to shutdown in a controlled way.
精彩评论