Thread synchronization behavior on VMWare with same guest OS, different host OS
I'm a TA for a computer science course and I've run into an interesting problem. A recent assignment involved synchronization techniques for pthreads. The students had to avoid deadlocks using mutexes, barriers, conditional variables, etc... Each student is running the same version of Ubuntu on a VMWare virtual machine (either Workstation or Fusion depending on their system). Obviously the host OS may be different for each student.
Now here's the confusing part: the synchronization behavior for some students is very different from what I see when I run their program. For some student, I may run her assignment and see a deadlock immediately. However, when she runs it at home she never gets a deadlock.
From my understandi开发者_高级运维ng, the deadlocking behavior seems only dependent on the guest OS's scheduler. The host OS should have nothing to do with this. Yet, even though we all have the same guest OS the problem persists. Does anyone have any idea of why this might be?
Thanks!
It sounds like the student has a non-deterministic deadlock. This is very common --- basically there is a small window where the code may deadlock, but otherwise the app runs OK. She has been lucky, but you haven't.
Small scheduling timings can be the culprit --- your CPU may have a different clock speed, or different number of cores, or a different background load, or whatever, and this is enough to change the scheduling.
This is actually a classic problem --- multithreaded code runs fine in the test environment but encounters problems in the production environment due to race conditions that just never manifested under test.
I will assume that your virtual machine is configured to use only one virtual core so that it can be run across any host machine. If so, you are correct to assume that the guest OS's scheduler is responsible for every preemption of the student's assignment.
However, the scheduler itself is heavily influenced by the hardware platform it's run on. Different systems will run the guest OS faster or slower, or produce hardware interrupts that take different amounts of time to handle or emulate. All of this will affect the scheduling decisions of the guest OS.
I really like how you distribute a VM to make sure everyone has the same development and runtime environment for the assignment. However, just because everyone has the same software doesn't mean they'll see the same behavior.
You need to also consider the host itself. I have had the case of identical CPUs (I thought) but had a slightly different intel chipset revision. This meant that in one VM the task switch register was optimized in KVM and in the other KVm was not able to optimize. This led to different timings in the guest for seemingly identical VMs and hosts.
Also bear in mind the host may be running page sharing processes or any number of other things at different times that could change the timing in the guest.
It can be fun to run your guest thread program under valgrind. As it is very slow, timing problems often pop up with threaded apps.
精彩评论