Thread synchronization. Why exactly this lock isn't enough to synchronize threads [duplicate]
Possible Duplicate:
Threads synchronization. How exactly lock makes access to memory 'correct'?
This question is inspired by this one.
We got a following test class
class Test
{
private static object ms_Lock=new object();
private static int ms_Sum = 0;
public static void Main ()
{
Parallel.Invoke(HalfJob, HalfJob);
Console.WriteLine(ms_Sum);
Console.ReadLine();
}
private static void HalfJob()
{
for (int i = 0; i < 50000000; i++) {
lock(ms_Lock) { }// empty lock
ms_Sum += 1;
}
}
}
Actual result is very close to expected value 100 000 000 (50 000 000 x 2, since 2 loops are runn开发者_JAVA百科ing at the same time), with around 600 - 200 difference (mistake is approx 0.0004% on my machine which is very low). No other way of synchronization can provide such way of approximation (its either a much bigger mistake % or its 100% correct)
We currently understand that such level of preciseness is because of program runs in the following way:
Time is running left to right, and 2 threads are represented by two rows.
where
black box represents process of acquiring, holding and releasing the
lock plus represents addition operation ( schema represents scale on my PC, lock takes approximated 20 times longer than add)
- white box represents period that consists of try to acquire lock, and further awaiting for it to become available
Also lock provides full memory fence.
So the question now is: if above schema represents what is going on, what is the cause of such big error (now its big cause schema looks like very strong syncrhonization schema)? We could understand difference between 1-10 on boundaries, but its clearly no the only reason of error? We cannot see when writes to ms_Sum can happen at the same time, to cause the error.
EDIT: many people like to jump to quick conclusions. I know what synchronization is, and that above construct is not a real or close to good way to synchronize threads if we need correct result. Have some faith in poster or maybe read linked answer first. I don't need a way to synchronize 2 threads to perform additions in parallel, I am exploring this extravagant and yet efficient , compared to any possible and approximate alternative, synchronization construct (it does synchronize to some extent so its not meaningless like suggested)
lock(ms_Lock) { }
this is meaningless construct. lock
gurantees exclusive execution of code inside it.
Let me explain why this empty lock
decreases (but doesn't eliminate!) the chance of data corruption. Let's simplify threading model a bit:
- Thread executes one line of code at time slice.
- Thread scheduling is done in strict round robin manner (A-B-A-B).
- Monitor.Enter/Exit takes significantly longer to execute than arithmetics. (Let's say 3 times longer. I stuffed code with
Nop
s that mean that previous line is still executing.) - Real
+=
takes 3 steps. I broke them down to atomic ones.
At left column shown which line is executed at the time slice of threads (A and B). At right column - the program (according to my model).
A B
1 1 SomeOperation();
1 2 SomeOperation();
2 3 Monitor.Enter(ms_Lock);
2 4 Nop();
3 5 Nop();
4 6 Monitor.Exit(ms_Lock);
5 7 Nop();
7 8 Nop();
8 9 int temp = ms_Sum;
3 10 temp++;
9 11 ms_Sum = temp;
4
10
5
11
A B
1 1 SomeOperation();
1 2 SomeOperation();
2 3 int temp = ms_Sum;
2 4 temp++;
3 5 ms_Sum = temp;
3
4
4
5
5
As you see in first scenario thread B just can't catch thread A and A has enough time to finish execution of ms_Sum += 1;
. In second scenario the ms_Sum += 1;
is interleaved and causes constant data corruption. In reality thread scheduling is stochastic, but it means that thread A has more change to finish increment before another thread gets there.
This is a very tight loop with not much going on inside it, so ms_Sum += 1
has a reasonable chance of being executed in "just the wrong moment" by the parallel threads.
Why would you ever write a code like this in practice?
Why not:
lock(ms_Lock) {
ms_Sum += 1;
}
or just:
Interlocked.Increment(ms_Sum);
?
-- EDIT ---
Some comments on why would you see the error despite memory barrier aspect of the lock... Imagine the following scenario:
- Thread A enters the
lock
, leaves thelock
and then is pre-empted by the OS scheduler. - Thread B enters and leaves the
lock
(possibly once, possibly more than once, possibly millions of times). - At that point the thread A is scheduled again.
- Both A and B hit the
ms_Sum += 1
at the same time, resulting in some increments being lost (because increment = load + add + store).
As noted the statement
lock(ms_Lock) {}
will cause a full memory barrier. In short this means the value of ms_Sum
will be flushed between all caches and updated ("visible") among all threads.
However, ms_Sum += 1
is still not atomic as it is just short-hand for ms_Sum = ms_Sum + 1
: a read, an operation, and an assignment. It is in this construct that there is still a race condition -- the count of ms_Sum
could be slightly lower than expected. I would also expect that the difference to be more without the memory barrier.
Here is a hypothetical situation of why it might be lower (A and B represent threads and a and b represent thread-local registers):
A: read ms_Sum -> a B: read ms_Sum -> b A: write a + 1 -> ms_Sum B: write b + 1 -> ms_Sum // change from A "discarded"
This depends on a very particular order of interleaving and is dependent upon factors such as thread execution granularity and relative time spent in said non-atomic region. I suspect the lock
itself will reduce (but not eliminate) the chance of the interleave above because each thread must wait-in-turn to get through it. The relative time spent in the lock itself to the increment may also play a factor.
Happy coding.
As others have noted, use the critical region established by the lock or one of the atomic increments provided to make it truly thread-safe.
As noted: lock(ms_Lock) { }
locks an empty block and so does nothing. You still have a race condition with ms_Sum += 1;
. You need:
lock( ms_Lock )
{
ms_Sum += 1 ;
}
[Edited to note:]
Unless you properly serialize access to ms_Sum, you have a race condition. Your code, as written does the following (assuming the optimizer doesn't just throw away the useless lock statement:
- Acquire lock
- Release lock
- Get value of ms_Sum
- Increment value of ms_Sum
- Store value of ms_Sum
Each thread may be suspended at any point, even in mid-instruction. Unless it is specifically documented as being atomic, it's a fair bet that any machine instruction that takes more than 1 clock cycle to execute may be interrupted mid-execution.
So let's assume that your lock is actually serializing the two threads. There is still nothing in place to prevent one thread from getting suspended (and thus giving priority to the other) whilst it is somewhere in the middle of executing the last three steps.
So the first thread in, locks, release, gets the value of ms_Sum and is then suspended. The second thread comes in, locks, releases, gets the [same] value of ms_Sum, increments it and stores the new value back in ms_Sum, then gets suspended. The 1st thread increments its now-outdates value and stores it.
There's your race condition.
The += operator is not atomic, that is, first it reads then it writes the new value. In the mean time, between reading and writing, the thread A could switch to the other B, in fact without writing the value... then the other thread B don't see the new value because it was not assigned by the other thread A... and when returning to the thread A it will discard all the work of the thread B.
精彩评论