PThreads & MultiCore CPU on Linux
I am writing a simple application that uses Threads to increase the performance. The problem is, that this application runs fine on windows, using the 2 cores that my CPU has. But When I execute on Linux, It seems that only uses 1 Core.
I can't understand why this happens.
These is my code, C++:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <time.h>
void* function(void*)
{
int i=0;
for(i=0; i<1110111; i++)
rand();
return 0;
}
void withOutThreads(void)
{
function(0);
function(0);
}
void withThreads(void)
{
pthread_t* h1 = new pthread_t;
pthread_t* h2 = new pthread_t;
pthread_attr_t* atr = new pthread_attr_t;
pthread_attr_init(atr);
pthread_attr_setscope(atr,PTHREAD_SCOPE_SYSTEM);
pthread_create(h1,atr,function,0);
pthread_create(h2,atr,function,0);
pthread_join(*h1,0);
pthread_join(*h2,0);
pthread_attr_destroy(atr);
delete h1;
delete h2;
delete atr;
}
int main(void)
{
int ini,tim;
ini = clock();
withOutThreads();
tim = (int) ( 1000*(clock()-ini)/CLOCKS_PER_SEC );
printf开发者_运维知识库("Time Sequential: %d ms\n",tim);
fflush(stdout);
ini = clock();
withThreads();
tim = (int) ( 1000*(clock()-ini)/CLOCKS_PER_SEC );
printf("Time Concurrent: %d ms\n",tim);
fflush(stdout);
return 0;
}
Output on Linux:
Time Sequential: 50 ms
Time Concurrent: 1610 ms
Output on Windows:
Time Sequential: 50 ms
Time Concurrent: 30 ms
clock() works different on windows vs linux, so don't use that to measure time. On linux it measures CPU time, on windows it measures wall clock time. Ideally these would be the same in this test case, but you should use something consistant between the platforms to measure the time. e.g. gettimeofday()
rand() serializes your threads on linux. rand() holds an internal lock as to be thread safe. The rand() manpage states rand() is not threadsafe nor reentrant, however at least the code in recent glibc aquires a lock around the call. I'm not sure how windows handles this, either it's not thread safe at all, or it uses thread local variables.
Use rand_r on linux, or find some better CPU utilization function to measure.
void* function(void*)
{
unsigned int seed = 42;
int i=0;
for(i=0; i<1110111; i++)
rand_r(&seed);
return 0;
}
The problem is that Linux multi-threaded version or rand()
locks a mutex. Change your function to:
void* function(void*)
{
int i=0;
unsigned rand_state = 0;
for(i=0; i<1110111; i++)
rand_r(&rand_state);
return 0;
}
Output:
Time Sequential: 10 ms
Time Concurrent: 10 ms
Linux "sees" threads like processes, it means all the processes are threads of one thread.
in the process table (task_struct) when we create a process it is created the PID, when we create a second thread then the PID becomes the TGID (thread group id) and every thread gets a TID (thread ID).
In userland we will see only the first thread (using ps aux) but if we execute "ps -eLf" we will see a new column named LWP (light weight process) which is the TID.
then for example:
$ ps -eLf
UID PID PPID LWP C NLWP STIME TTY TIME CMD
root 1356 1 1356 0 4 2014 ? 00:00:00 /sbin/rsyslogd
root 1356 1 1357 0 4 2014 ? 00:02:01 /sbin/rsyslogd
root 1356 1 1359 0 4 2014 ? 00:01:55 /sbin/rsyslogd
root 1356 1 1360 0 4 2014 ? 00:00:00 /sbin/rsyslogd
dbus 1377 1 1377 0 1 2014 ? 00:00:00 dbus-daemon
As we can see the PID is the same, but the real PID is the LWP (TID). When the process has only one thread (such as dbus daemon) the PID = LWP (TID)
Internally the kernel always uses the TID like the PID.
After that the kernel will be able to use the schedule every thread using real parallelism.
That sounds like an OS scheduler implementation to me. Not per se a problem in your code. The OS decides which thread will run on what core and if the rules of thread/CPU affinity are adhered to, it will stick that thread on the same CPU each time.
That is a simple explanation for a fairly complex subject.
精彩评论