开发者

Efficient and fast way for thread argument

What is the most efficient way to create a thread with argument? The argument is a struct, if the struct can not stay on parent thread stack, there are two solutions.

With dynamic memory allocation

struct Arg{
    int x;
    int y;
};

void* my_thread(void* v_arg){
    Arg* arg = (Arg*) v_arg;

    //... running

    delete arg;
    return NULL;
}

//Creating a thread
void a_function(){
    Arg* arg = new Arg;
    arg->x = 1; arg->y = 2;

    pthread_t t;
    pthread_create(&t, NULL, my_thread, arg);
    pthread_detach(t);
}

With semaphore

struct Arg{
    sem_t sem;
    int x;
    int y;
};

void* my_thread(void* v_arg){
    Arg* arg = (Arg*) v_arg;
    int arg_x = v_arg->x;
    int arg_y = v_arg->y;
    sem_post( &(v_arg->sem) );

    //... running

    return NULL;
}

//Creating a thread
void a_function(){
    Arg arg;
    arg.x = 1; arg.y = 2;
    sem_init( &开发者_开发问答;(arg.sem), 0, 0);

    pthread_t t;
    pthread_create(&t, NULL, my_thread, &arg);
    pthread_detach(t);

    sem_wait( &(arg.sem) );
    sem_destroy( &(arg.sem) );
}

I work with Linux and Windows.


In the code you have posted, the most efficient implementation is using heap allocation (your first example). The reason for this is that heap allocation (using new() or malloc) is much cheaper than context switching. Consider what need to happen in your second example:

  1. Allocate stack space for Arg
  2. Initialize the semphore
  3. Start the thread and switch context
  4. Copy the variables to the new stack
  5. Switch context back
  6. Destroy semaphore
  7. Detach thread
  8. Switch context

Alternatively, your first example:

  1. Allocate heap space for Arg
  2. Start the thread
  3. Detach thread
  4. Switch context


It depends. If your struct isn't big it's better to allocate it dynamically in order to minimize odd synchronization calls. Otherwise if your structure is quite big and you write code for a system with little memory, it's better to use semaphore(or even condvar).


Atomic op solution. This is a very high speed way to get memory for your arguments.

If the arguments are always the same size, pre-allocate a bunch of them. Add a pNext to your struct to link them together. Create a _pRecycle global to hold all the available ones as a linked list using pNext to link them. When you need an argument, use atomic ops to CAS one off the head of the trash list. When you are done use atomic ops to put the arg back at the head of the trash list.

CAS refers to something like __sync_bool_compare_and_swap which returns 1 on success.

to grab argument memory:

while (1)  {  // concurrency loop
  pArg = _pRecycle;  // _pRecycle is the global ptr to the head of the available arguments
  // POINT A
  if (CAS(&_pRecycle, pArg->pNext, pArg))  // change pRecycle to next item if pRecycle hasn't changed.
    break; // success
  // POINT B
}
// you can now use pArg to pass arguments

to recycle argument memory when done:

while (1)  {  // concurrency loop
  pArg->pNext = _pRecycle;
  if (CAS(&_pRecycle, pArg, pArg->pNext))  // change _pRecycle to pArg if _pRecycle hasn't changed.
    break; // success
}
// you have given the mem back 

There is a race condition if something uses and recycles pArg while another thread is swapped out between point A and B. If your work takes a long time to process this wont be a problem. Otherwise you need to version the head of the list... To do that you need to be able to atomically change two things at once... Unions combined with 64 bit CAS to the rescue!

typedef union _RecycleList {
  struct {
    int   iversion;
    TArg *pHead;
  }
  unsigned long n64;  // this value is iVersion and pHead at the same time!
} TRecycleList;

TRecycleList _Recycle;

to get mem:

while (1)  // concurrency loop
{
  TRecycleList Old.n64 = _Recycle.n64;
  TRecycleList New.n64 = Old.n64;
  New.iVersion++;
  pArg = New.pHead;
  New.pHead = New.pHead->pNext;
  if (CAS(&_Recycle.n64, New.n64, Old.n64)) // if version isnt changed we get mem
    break; // success
}

to put mem back:

while (1)  // concurrency loop
{
  TRecycleList Old.n64 = _Recycle.n64;
  TRecycleList New.n64 = Old.n64;
  New.iVersion++;
  pArg->pNext = New.pHead;
  New.pHead = pArg;
  if (CAS(&_Recycle.n64, New.n64, Old.n64))  // if version isnt changed we release mem
    break; // success
}

Since 99.9999999% of the time no two threads will be executing the code to grab memory at the same time, you get great performance. Our tests have shown CAS to be as little as 2x as slow as just setting _pRecycle = pRecycle->pNext. 64 bit and 128 bit CAS are just as fast as 32 bit. Basicly it screams. Every once in awhile the concurrency loop will execute twice when two threads actually race. One always will win, so the race resolves very fast.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜