different between cmpxchg and btr bts
btr,bts
instruction is simple and it's can lock the share resource.
Why does the instruction cmpxchg
exist? What's the differ开发者_JS百科ent between these two instructions?
IIRC (it's been a while) lock btr
is more expensive than cmpxchg
, which was designed to automatically lock the bus for atomicity and to do so as quickly as possible. (Specifically, lock INSTR
holds the bus lock for the entire instruction cycle, and does full invalidation, but the microcode for cmpxchg
locks and invalidates only when absolutely needed so as to be the fastest possible synchronization primitive.)
(Edit: it also enables fancier (user-)lock-free strategies, per this message.
CMPXCHG [memaddr], reg
compares a memory location toEAX
(orAX
, orAL
); if they are the same, it writes the source operand to the memory location. This can obviously be used in the same way asXCHG
, but it can be used in another very interesting way as well, for lock-free synchronization.Suppose you have a process that updates a shared data structure. To ensure atomicity, it generates a private updated copy of the data structure; when it is finished, it atomically updates a single pointer which used to point to the old data structure so that it now points to the new data structure.
The straightforward way of doing this will be useful if there's some possibility of the process failing, and it gives you atomicity. But we can modify this procedure only a little bit to allow multiple simultaneous updates while ensuring correctness.
The process simply atomically compares the pointer to the value it had when it started its work, and if so, makes the pointer point to the new data structure. If some other process has updated the data structure in the mean time, the comparison will fail and the exchange will not happen. In this case, the process must start over from the newly-updated data structure.
(This is essentially a primitive form of Software Transactional Memory.)
BTR
and BTS
work on a bit level, where as CMPXCHG
works on a wider data type(generally 32, 64 or 128 bits at once). They also function differently, the intel developer manuals give a good summary of how they work. It may also help to note that certain processors may have implemented BTR
and BTS
poorly (due to them not being so widely utilised), making CMPXCHG
the better option for high performance locks.
精彩评论