.NET: ThreadStatic vs lock { }. Why ThreadStaticAttribute degrades performance?
I've written small test program and was surprised why lock {}
solution performs faster than lock-free but with [ThreadStatic]
attribute over static variable.
[ThreadStatic] snippet:
[ThreadStatic]
private static long ms_Acc;
public static void RunTest()
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
int one = 1;
for (int i = 0; i < 100 * 1000 * 1000; ++i) {
ms_Acc += one;
ms_Acc /= one;
}
stopwatch.Stop();
Console.WriteLine("Time taken: {0}", stopwatch.Elapsed.TotalSeconds);
}
lock {} snippet:
private static long ms_Acc;
private static object ms_Lock = new object();
public static void RunTest()
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
int one = 1;
for (int i = 0; i < 100 * 1000 * 1000; ++i) {
lock (ms_Lock) {
ms_Acc += one;
ms_Acc /= one;
}
}
stopwatch.Stop();
Console.WriteLine("Time t开发者_C百科aken: {0}", stopwatch.Elapsed.TotalSeconds);
}
On my machine first snippet takes 4.2 seconds; second - 3.2 seconds, which is 1 second faster. Without ThreadStatic and lock - 1.2 seconds.
I'm curious why [ThreadStatic]
attribute in this simple example adds so many to program execution time?
UPDATE: I feel very sorry, but these results are for DEBUG
build. For RELEASE
one I got completely different numbers: (1.2; 2.4; 1.2). For DEBUG
numbers were (4.2; 3.2; 1.2).
So, for RELEASE
build there seems to be no [ThreadStatic]
performance penalty.
For RELEASE build there seems to be almost no [ThreadStatic] performance penalty (only slight penalty on modern CPUs).
Here comes dis-assembly code for ms_Acc += one
; for RELEASE
optimization is enabled:
No [ThreadStatic]
, DEBUG
:
00000060 mov eax,dword ptr [ebp-40h]
00000063 add dword ptr ds:[00511718h],eax
No [ThreadStatic]
, RELEASE
:
00000051 mov eax,dword ptr [00040750h]
00000057 add eax,dword ptr [rsp+20h]
0000005b mov dword ptr [00040750h],eax
[ThreadStatic]
, DEBUG
:
00000066 mov edx,1
0000006b mov ecx,4616E0h
00000070 call 664F7450
00000075 mov edx,1
0000007a mov ecx,4616E0h
0000007f mov dword ptr [ebp-50h],eax
00000082 call 664F7450
00000087 mov edx,dword ptr [eax+18h]
0000008a add edx,dword ptr [ebp-40h]
0000008d mov eax,dword ptr [ebp-50h]
00000090 mov dword ptr [eax+18h],edx
[ThreadStatic]
, RELEASE
:
00000058 mov edx,1
0000005d mov rcx,7FF001A3F28h
00000067 call FFFFFFFFF6F9F740
0000006c mov qword ptr [rsp+30h],rax
00000071 mov rbx,qword ptr [rsp+30h]
00000076 mov ebx,dword ptr [rbx+20h]
00000079 add ebx,dword ptr [rsp+20h]
0000007d mov edx,1
00000082 mov rcx,7FF001A3F28h
0000008c call FFFFFFFFF6F9F740
00000091 mov qword ptr [rsp+38h],rax
00000096 mov rax,qword ptr [rsp+38h]
0000009b mov dword ptr [rax+20h],ebx
You have two lines of code that update ms_Acc
. In the lock
case, you have a single lock around both of these, while in the ThreadStatic
case, it happens once for each access to ms_Acc
, i.e. twice for each iteration of your loop. This is generally the benefit of using lock
, you get to choose the granularity you want. I am guessing that the RELEASE build optimised this difference away.
I would be interested to see if the performance becomes very similar, or identical, if you change the for loop to a single access to ms_Acc
.
精彩评论