Efficient and fast way for thread argument

2023-02-11 12:16 问答作者：

What is the most efficient way to create a thread with argument? The argument is a struct, if the struct can not stay on parent thread stack, there are two solutions.

With dynamic memory allocation

struct Arg{
    int x;
    int y;
};

void* my_thread(void* v_arg){
    Arg* arg = (Arg*) v_arg;

    //... running

    delete arg;
    return NULL;
}

//Creating a thread
void a_function(){
    Arg* arg = new Arg;
    arg->x = 1; arg->y = 2;

    pthread_t t;
    pthread_create(&t, NULL, my_thread, arg);
    pthread_detach(t);
}

With semaphore

struct Arg{
    sem_t sem;
    int x;
    int y;
};

void* my_thread(void* v_arg){
    Arg* arg = (Arg*) v_arg;
    int arg_x = v_arg->x;
    int arg_y = v_arg->y;
    sem_post( &(v_arg->sem) );

    //... running

    return NULL;
}

//Creating a thread
void a_function(){
    Arg arg;
    arg.x = 1; arg.y = 2;
    sem_init( &开发者_开发问答;(arg.sem), 0, 0);

    pthread_t t;
    pthread_create(&t, NULL, my_thread, &arg);
    pthread_detach(t);

    sem_wait( &(arg.sem) );
    sem_destroy( &(arg.sem) );
}

I work with Linux and Windows.

In the code you have posted, the most efficient implementation is using heap allocation (your first example). The reason for this is that heap allocation (using new() or malloc) is much cheaper than context switching. Consider what need to happen in your second example:

Allocate stack space for Arg
Initialize the semphore
Start the thread and switch context
Copy the variables to the new stack
Switch context back
Destroy semaphore
Detach thread
Switch context

Alternatively, your first example:

Allocate heap space for Arg
Start the thread
Detach thread
Switch context

It depends. If your struct isn't big it's better to allocate it dynamically in order to minimize odd synchronization calls. Otherwise if your structure is quite big and you write code for a system with little memory, it's better to use semaphore(or even condvar).

Atomic op solution. This is a very high speed way to get memory for your arguments.

If the arguments are always the same size, pre-allocate a bunch of them. Add a pNext to your struct to link them together. Create a _pRecycle global to hold all the available ones as a linked list using pNext to link them. When you need an argument, use atomic ops to CAS one off the head of the trash list. When you are done use atomic ops to put the arg back at the head of the trash list.

CAS refers to something like __sync_bool_compare_and_swap which returns 1 on success.

to grab argument memory:

while (1)  {  // concurrency loop
  pArg = _pRecycle;  // _pRecycle is the global ptr to the head of the available arguments
  // POINT A
  if (CAS(&_pRecycle, pArg->pNext, pArg))  // change pRecycle to next item if pRecycle hasn't changed.
    break; // success
  // POINT B
}
// you can now use pArg to pass arguments

to recycle argument memory when done:

while (1)  {  // concurrency loop
  pArg->pNext = _pRecycle;
  if (CAS(&_pRecycle, pArg, pArg->pNext))  // change _pRecycle to pArg if _pRecycle hasn't changed.
    break; // success
}
// you have given the mem back

There is a race condition if something uses and recycles pArg while another thread is swapped out between point A and B. If your work takes a long time to process this wont be a problem. Otherwise you need to version the head of the list... To do that you need to be able to atomically change two things at once... Unions combined with 64 bit CAS to the rescue!

typedef union _RecycleList {
  struct {
    int   iversion;
    TArg *pHead;
  }
  unsigned long n64;  // this value is iVersion and pHead at the same time!
} TRecycleList;

TRecycleList _Recycle;

to get mem:

while (1)  // concurrency loop
{
  TRecycleList Old.n64 = _Recycle.n64;
  TRecycleList New.n64 = Old.n64;
  New.iVersion++;
  pArg = New.pHead;
  New.pHead = New.pHead->pNext;
  if (CAS(&_Recycle.n64, New.n64, Old.n64)) // if version isnt changed we get mem
    break; // success
}

to put mem back:

while (1)  // concurrency loop
{
  TRecycleList Old.n64 = _Recycle.n64;
  TRecycleList New.n64 = Old.n64;
  New.iVersion++;
  pArg->pNext = New.pHead;
  New.pHead = pArg;
  if (CAS(&_Recycle.n64, New.n64, Old.n64))  // if version isnt changed we release mem
    break; // success
}

Since 99.9999999% of the time no two threads will be executing the code to grab memory at the same time, you get great performance. Our tests have shown CAS to be as little as 2x as slow as just setting _pRecycle = pRecycle->pNext. 64 bit and 128 bit CAS are just as fast as 32 bit. Basicly it screams. Every once in awhile the concurrency loop will execute twice when two threads actually race. One always will win, so the race resolves very fast.

继续阅读：c multithreading performance pthreads

Efficient and fast way for thread argument

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？