开发者

different between cmpxchg and btr bts

btr,bts instruction is simple and it's can lock the share resource.

Why does the instruction cmpxchg exist? What's the differ开发者_JS百科ent between these two instructions?


IIRC (it's been a while) lock btr is more expensive than cmpxchg, which was designed to automatically lock the bus for atomicity and to do so as quickly as possible. (Specifically, lock INSTR holds the bus lock for the entire instruction cycle, and does full invalidation, but the microcode for cmpxchg locks and invalidates only when absolutely needed so as to be the fastest possible synchronization primitive.)

(Edit: it also enables fancier (user-)lock-free strategies, per this message.

CMPXCHG [memaddr], reg compares a memory location to EAX (or AX, or AL); if they are the same, it writes the source operand to the memory location. This can obviously be used in the same way as XCHG, but it can be used in another very interesting way as well, for lock-free synchronization.

Suppose you have a process that updates a shared data structure. To ensure atomicity, it generates a private updated copy of the data structure; when it is finished, it atomically updates a single pointer which used to point to the old data structure so that it now points to the new data structure.

The straightforward way of doing this will be useful if there's some possibility of the process failing, and it gives you atomicity. But we can modify this procedure only a little bit to allow multiple simultaneous updates while ensuring correctness.

The process simply atomically compares the pointer to the value it had when it started its work, and if so, makes the pointer point to the new data structure. If some other process has updated the data structure in the mean time, the comparison will fail and the exchange will not happen. In this case, the process must start over from the newly-updated data structure.

(This is essentially a primitive form of Software Transactional Memory.)


BTR and BTS work on a bit level, where as CMPXCHG works on a wider data type(generally 32, 64 or 128 bits at once). They also function differently, the intel developer manuals give a good summary of how they work. It may also help to note that certain processors may have implemented BTR and BTS poorly (due to them not being so widely utilised), making CMPXCHG the better option for high performance locks.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜