Do sequence points prevent code reordering across critical section boundaries?
Suppose that one has some lock based code like the following where mutexes are used to guard against inappropriate concurrent read and write
mutex.get() ; // get a lock.
T localVar = pSharedMem->v ; // read something
pSharedMem->w = blah ; // write something.
pSharedMem->z++ ; // read and write something.
mutex.release() ; // release the lock.
If one assumed that the generated code was created in program order, there is still a requirement for appropriate hardware memory barriers like isync,lwsync,.acq,.rel. I'll assume for this question that the mutex implementation takes care of this part, providing a guarentee that the pSharedMem reads and writes all occur "after" the get, and "before" the release() [but that surrounding reads and writes can get into the critical section as I expect is the norm for mutex implementations]. I'll also assume that volatile accesses are used in the mutex implementation where appropriate, but that volatile is NOT used for the data protected by the mutex (understanding why volatile does not appear to be a requirement for the mutex protected Data is really part of this question).
I'd like to understand what prevents the compiler from moving the pSharedMem accesses outside of the critical region. In the C and C++ standards I see that there is a concept of sequence point. Much of the sequence point text in the standards docs I found incomprehensible, but if I was to guess what it was about, it is a statement that code should not be reordered across a point where there is a call with unknown side effects. Is that the jist of it? If that is the case what sort of optimization freedom does the compiler have here?
With compilers doing tricky optimizations like profile driven interprocedural inlining (even across file boundaries), even the concept of unknown side effect gets kind of blurry.
It is perhaps beyond the scope of a simple question to explain this in a self contained way here, s开发者_如何转开发o I am open to being pointed at references (preferrably online and targetted at mortal programmers not compiler writers and language designers).
EDIT: (in response to Jalf's reply)
I'd mentioned the memory barrier instructions like lwsync and isync because of the CPU reordering issues you also mentioned. I happen to work in the same lab as the compiler guys (for one of our platforms at least), and having talked to the implementers of the intrinsics I happen to know that at least for the xlC compiler __isync() and __lwsync() (and the rest of the atomic intrinsics) are also a code reordering barrier. In our spinlock implementation this is visible to the compiler since this part of our critical section is inlined.
However, suppose you weren't using a custom build lock implementation (like we happen to be, which is likely uncommon), and just called a generic interface such as pthread_mutex_lock(). There the compiler isn't informed anything more than the prototype. I've never seen it suggested that code would be non-functional
pthread_mutex_lock( &m ) ; pSharedMem->someNonVolatileVar++ ; pthread_mutex_unlock( &m ) ; pthread_mutex_lock( &m ) ; pSharedMem->someNonVolatileVar++ ; pthread_mutex_unlock( &m ) ;
would be non-functional unless the variable was changed to volatile. That increment is going to have a load/increment/store sequence in each of the back to back blocks of code, and would not operate correctly if the value of the first increment is retained in-register for the second.
It seems likely that the unknown side effects of the pthread_mutex_lock() is what protects this back to back increment example from behaving incorrectly.
I'm talking myself into a conclusion that the semantics of a code sequence like this in a threaded environment is not really strictly covered by the C or C++ language specs.
In short, the compiler is allowed to reorder or transform the program as it likes, as long as the observable behavior on a C++ virtual machine does not change. The C++ standard has no concept of threads, and so this fictive VM only runs a single thread. And on such an imaginary machine, we don't have to worry about what other threads see. As long as the changes don't alter the outcome of the current thread, all code transformations are valid, including reordering memory accesses across sequence points.
understanding why volatile does not appear to be a requirement for the mutex protected Data is really part of this question
Volatile ensures one thing, and one thing only: reads from a volatile variable will be read from memory every time -- the compiler won't assume that the value can be cached in a register. And likewise, writes will be written through to memory. The compiler won't keep it around in a register "for a while, before writing it out to memory".
But that's all. When the write occurs, a write will be performed, and when the read occurs, a read will be performed. But it doesn't guarantee anything about when this read/write will take place. The compiler may, as it usually does, reorder operations as it sees fit (as long as it doesn't change the observable behavior in the current thread, the one that the imaginary C++ CPU knows about). So volatile doesn't really solve the problem. On the other hand, it offers a guarantee that we don't really need. We don't need every write to the variable to be written out immediately, we just want to ensure that they get written out before crossing this boundary. It's fine if they're cached until then - and likewise, once we've crossed the critical section boundary, subsequent writes can be cached again for all we care -- until we cross the boundary the next time. So volatile offers a too strong guarantee which we don't need, but doesn't offer the one we do need (that reads/writes won't get reordered)
So to implement critical sections, we need to rely on compiler magic. We have to tell it that "ok, forget about the C++ standard for a moment, I don't care what optimizations it would have allowed if you'd followed that strictly. You must NOT reorder any memory accesses across this boundary".
Critical sections are typically implemented via special compiler intrinsics (essentially special functions that are understood by the compiler), which 1) force the compiler to avoid reordering across that intrinsic, and 2) makes it emit the necessary instructions to get the CPU to respect the same boundary (because the CPU reorders instructions too, and without issuing a memory barrier instruction, we'd risk the CPU doing the same reordering that we just prevented the compiler from doing)
No, sequence points do not prevent rearranging of operations. The main, most broad rule that governs optimizations is the requirement imposed on so called observable behavior. Observable behavior, by definition, is read/write accesses to volatile
variables and calls to library I/O functions. These events must occur in the same order and produce the same results as they would in the "canonically" executed program. Everything else can be rearranged and optimized absolutely freely by the compiler, in any way it sees fit, completely ignoring any orderings imposed by sequence points.
Of course, most compilers are trying not to do any excessively wild rearrangements. However, the issue you are mentioning has become a real practical issue with modern compilers in recent years. Many implementations offer additional implementation-specific mechanisms that allow user to ask the compiler not to cross certain boundaries when doing optimizing rearrangements.
Since, as you are saying, the protected data is not declared as volatile
, formally speaking the access can be moved outside of the protected region. If you declare the data as volatile
, it should prevent this from happening (assuming that mutex access is also volatile
).
Let look at the following example:
my_pthread_mutex_lock( &m ) ;
someNonVolatileGlobalVar++ ;
my_pthread_mutex_unlock( &m ) ;
The function my_pthread_mutex_lock() is just calling pthread_mutex_lock(). By using my_pthread_mutex_lock(), I'm sure the compiler doesn't know that it's a synchronization function. For the compiler, it's just a function, and for me, it's a synchronization function that I can easily reimplement. Because someNonVolatileGlobalVar is global, I expected the compiler doesn't move someNonVolatileGlobalVar++ outside the critical section. In fact, due to the observable behavior, even in a single thread situation, the compiler doesn't know if the function before and the one after this instruction are modifying the global var. So, to keep the observable behavior correct, it has to keep the execution order as it is written. I hope pthread_mutex_lock() and pthread_mutex_unlock() also perform hardware memory barriers, in order to prevent the hardware moving this instruction outside the critical section.
do I am right ?
If I write:
my_pthread_mutex_lock( &m ) ;
someNonVolatileGlobalVar1++ ;
someNonVolatileGlobalVar2++ ;
my_pthread_mutex_unlock( &m ) ;
I cannot know which one of the two variables is incremented first, but this is normally not an issue.
Now, if I write:
someGlobalPointer = &someNonVolatileLocalVar;
my_pthread_mutex_lock( &m ) ;
someNonVolatileLocalVar++ ;
my_pthread_mutex_unlock( &m ) ;
or
someLocalPointer = &someNonVolatileGlobalVar;
my_pthread_mutex_lock( &m ) ;
(*someLocalPointer)++ ;
my_pthread_mutex_unlock( &m ) ;
Is the compiler doing what an ingenuous developer expects ?
C/C++ sequence points occur, for example, when ';' is encountered. At which point all side-effects of all operations that preceded it must occur. However, I'm fairly certain that by "side-effect" what's meant is operations that are part of the language itself (like z being incremented in 'z++') and not effects at lower/higher levels (like what the OS actually does with regards to memory management, thread management, etc. after an operation is completed).
Does that answer your question kinda? My point is really just that AFAIK the concept of sequence points doesn't really have anything to do with the side effects you're referring to.
hth
see something in [linux-kernel]/Documentation/memory-barriers.txt
精彩评论