Atomically read/write int value w/o additional operation on the int value itself
GCC offers a nice set of built-in functions for atomic operations. And being on MacOS or iOS, even Apple offers a nice set of atomic functions. However, all these functions perform an operation, e.g. an addition/subtraction, a logical operation (AND/OR/XOR) or a compare-and-set/compare-and-swap. What I am looking for is a way to atomically assign/read an int
value, like:
int a;
/* ... */
a = someVariable;
That's all. a
will be read by another thread and it is only important that a
either has its old value or its new value. Unfortunately the C standard does not guarantee that assigning or reading a value is an atomic operation. I remember that I once read somewhere, that writing or reading a value to a variable of type int
is guaranteed to be atomic in GCC (regardless the size of int) but I searched everywhere on the GCC homepage and I cannot find this statement any longer (maybe it was removed).
I cannot use sig_atomic_t
because sig_atomic_t has no guaranteed size and it might also have a different size than int
.
Since only one thread will ever "write" a value to a
, while both threads will "read" the current value of a
, I don't need to perform the operations themselves in an atomic manner, e.g.:
/* thread 1 */
someVariable = atomicRead(a);
/* Do something with someVariable, non-atomic, when done */
atomicWrite(a, someVariable);
/* thread 2 */
someVariable = atomicRead(a);
/* Do something with someVariable, but never write to a */
If both threads were going to write to a
, then all operations would have to be atomic, but that way, this may only waste CPU time; and we are extremely low on CPU resources in our project. So far we use a mutex around read/write operations of a
and even though the mutex is held for such a tiny amount of time, this already causes problems (one of the threads is a realtime thread and blocking on a mutex causes it to fail its realtime constraints, which is pretty bad).
Of course I could use a __sync_fetch_and_add
to read the variable (and simply add 开发者_运维技巧"0" to it, to not modify its value) and for writing use a __sync_val_compare_and_swap
for writing it (as I know its old value, so passing that in will make sure the value is always exchanged), but won't this add unnecessary overhead?
A __sync_fetch_and_add
with a 0 argument is indeed the best bet if you want your load to be atomic and act as a memory barrier. Similarly, you can use an and
with 0 or an or
with -1 to store 0 and -1 atomically with a memory barrier. For writing, you can use __sync_test_and_set
(actually an xchg operation) if an "acquire" barrier is enough, or if using Clang you can use __sync_swap
(which is an xchg operation with a full barrier).
However, in many cases that's overkill and you may prefer to add memory barriers manually. If you do not want the memory barrier, you can use a volatile load to atomically read/write a variable that is aligned and no wider than a word:
#define __sync_access(x) (*(volatile __typeof__(x) *) &(x))
(This macro is an lvalue, so you can also use it for a store like __sync_store(x) = 0
). The function implements the same semantics as the C++11 memory_order_consume
form, but only under two assumptions:
that your machine has coherent caches; if not, you need a memory barrier or global cache flush before the load (or before the first of a group of load).
that your machine is not a DEC Alpha. The Alpha had very relaxed semantics for reordering memory accesses, so on it you'd need a memory barrier after the load (and after each load in a group of loads). On the Alpha the above macro only provides
memory_order_relaxed
semantics. BTW, the first versions of the Alpha couldn't even store a byte atomically (only a word, which was 8 bytes).
In either case, the __sync_fetch_and_add
would work. As far as I know, no other machine imitated the Alpha so neither assumption should pose problems on current computers.
Volatile, aligned, word sized reads/writes are atomic on most platforms. Checking your assembly would be the best way to find out if this is true on your platform. Atomic registers cannot produce nearly as many interesting wait free structures as the more complicated mechanisms like compare and swap, which is why they are included.
See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.5659&rank=3 for the theory.
Regarding synch_fetch_and_add with a 0 argument - This seems like the safest bet. If you're worried about efficiency, profile the code and see if you're meeting your performance targets. You may be falling victim to premature optimization.
精彩评论