WinAPI _Interlocked* intrinsic functions for char, short

2023-02-12 18:12 问答作者：

I need to use _Interlocked*** function on char or short, but it takes long pointer as input. It seems that there is fu开发者_如何学编程nction _InterlockedExchange8, I don't see any documentation on that. Looks like this is undocumented feature. Also compiler wasn't able to find _InterlockedAdd8 function. I would appreciate any information on that functions, recommendations to use/not to use and other solutions as well.

update 1

I'll try to simplify the question. How can I make this work?

struct X
{
    char data;
};

X atomic_exchange(X another)
{
    return _InterlockedExchange( ??? );
}

I see two possible solutions

Use _InterlockedExchange8
Cast another to long, do exchange and cast result back to X

First one is obviously bad solution. Second one looks better, but how to implement it?

update 2

What do you think about something like this?

template <typename T, typename U>
class padded_variable
{
public:
    padded_variable(T v): var(v) {}
    padded_variable(U v): var(*static_cast<T*>(static_cast<void*>(&v))) {}
    U& cast()
    {
        return *static_cast<U*>(static_cast<void*>(&var));
    }
    T& get()
    {
        return var;
    }
private:
    T var;
    char padding[sizeof(U) - sizeof(T)];
};

struct X
{
    char data;
};

template <typename T, int S = sizeof(T)> class var;
template <typename T> class var<T, 1>
{
public:
    var(): data(T()) {}
    T atomic_exchange(T another)
    {
        padded_variable<T, long> xch(another);
        padded_variable<T, long> res(_InterlockedExchange(&data.cast(), xch.cast()));
        return res.get();
    }
private:
    padded_variable<T, long> data;
};

Thanks.

It's pretty easy to make 8-bit and 16-bit interlocked functions but the reason they're not included in WinAPI is due to IA64 portability. If you want to support Win64 the assembler cannot be inline as MSVC no longer supports it. As external function units, using MASM64, they will not be as fast as inline code or intrinsics so you are wiser to investigate promoting algorithms to use 32-bit and 64-bit atomic operations instead.

Example interlocked API implementation: intrin.asm

Why do you want to use smaller data types? So you can fit a bunch of them in a small memory space? That's just going to lead to false sharing and cache line contention.

Whether you use locking or lockless algorithms, it's ideal to have your data in blocks of at least 128 bytes (or whatever the cache line size is on your CPU) that are only used by a single thread at a time.

Well, you have to make do with the functions available. _InterlockedIncrement and `_InterlockedCompareExchange are available in 16 and 32-bit variants (the latter in a 64-bit variant as well), and maybe a few other interlocked intrinsics are available in 16-bit versions as well, but InterlockedAdd doesn't seem to be, and there seem to be no byte-sized Interlocked intrinsics/functions at all.

So... You need to take a step back and figure out how to solve your problem without an IntrinsicAdd8.

Why are you working with individual bytes in any case? Stick to int-sized objects unless you have a really good reason to use something smaller.

Creating a new answer because your edit changed things a bit:

Use _InterlockedExchange8

Cast another to long, do exchange and cast result back to X

The first simply won't work. Even if the function existed, it would allow you to atomically update a byte at a time. Which means that the object as a whole would be updated in a series of steps which wouldn't be atomic.

The second doesn't work either, unless X is a long-sized POD type. (and unless it is aligned on a sizeof(long) boundary, and unless it is of the same size as a long)

In order to solve this problem you need to narrow down what types X might be. First, of course, is it guaranteed to be a POD type? If not, you have an entirely different problem, as you can't safely treat non-POD types as raw memory bytes.

Second, what sizes may X have? The Interlocked functions can handle 16, 32 and, depending on circumstances, maybe 64 or even 128 bit widths.

Does that cover all the cases you can encounter?

If not, you may have to abandon these atomic operations, and settle for plain old locks. Lock a Mutex to ensure that only one thread touches these objects at a time.

继续阅读：interlocked intrinsics winapi

WinAPI _Interlocked* intrinsic functions for char, short

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？