开发者

What is the difference between copying unsigned int 2 times and unsigned long 1 time in 64 bit systems?

What is the difference between

*(unsigned*)d = *(unsigned*)s; 
d+=4; s+=4; 
*(unsigned*)d = *(unsigned*)s; 
d+=4; s+=4;

and

*(unsigned l开发者_如何学JAVAong*)d = *(unsigned long*)s;
d+=8; s+=8;

on 64bit systems?


Provided that nothing unpleasant happens in respect of padding bits or strict aliasing rules, and assuming the sizes of the types are as you expect, and provided that the memory regions don't overlap, and are correctly aligned, then they each copy 8 bytes from one place to another.

Of course, aside from the practical effect there may be a difference in performance and/or code size.

If you're seeing something break, then look at the actual code emitted, that might tell you what has gone wrong. Unless you have a lot of optimization switched on, and maybe even with optimization, I don't immediately see why those wouldn't be equivalent with AMD64, Ubuntu, and gcc.

Things I've mentioned that could go wrong:

  • padding bits - doesn't apply to GCC, but the standard permits unsigned an unsigned long to have padding bits, and if so then there could be bit patterns which are trap representations of one or both, which could explode as soon as you dereference.
  • strict aliasing - unlikely to affect what that code does, but could affect the code you use to check the result. For example, if s and d are the result of casting pointers-to-double to uint8_t*, and you look at the resulting double, then in one or both cases you might not see the effects of the change because you have an illegal type-pun.
  • sizes of the types - shouldn't apply here since 64 bit linux is LP64, but obviously if sizeof(long) == 4 then the two aren't equivalent. long is 32 bits on 64bit Windows systems, just not 64bit Linux ones.
  • overlap - if d == s + 4, then the two code snippets have different effect. Because of this, you won't see the first optimized to become the second unless the compiler knows that d and s point to entirely different places (and that's what C99 restrict is for).
  • alignment - I can't remember what the alignment requirements are for x86-64: for x86 you can get away with an unaligned read/write, it's just slower. In general, if s or d is correctly aligned for int but not long then there's a difference. (Edit: apparently you can enable or disable hardware exceptions for unaligned access on x86-64).


If you need to copy exactly eight byte, why not using memcpy() ?

memcpy(d, s, 8);

Using GCC, it will emit inline code instead of calling the library function, so it should be as faster as your hand written memory copy.

Added bonuses, your code will work on ILP32 systems, LP64 (most 64bits Unix) and LLP64 (win64), and even on system with strict alignment requirements.


If performance is not critical, you should probably just use the memcpy() as in another answer.

If this code occurs soon after a write to *s, match the types; if this code occurs soon before a read from *d, match the types. This will ensure store-to-load forwarding (moving the data from the store directly to the load, without waiting for the store to write the data back into the data cache) will work on as many CPUs as possible. Store-to-load forwarding almost always works if the addresses and sizes of the store and load match and are aligned, and may work more often depending on CPU. If store-to-load forwarding fails, the penalty tends to be in the order of 10 clock cycles.

If you can avoid a store-to-load forwarding problem by adding additional shift/and/or operations, this is often faster.

If you use C's type system more effectively and avoid casts, many store-to-load forwarding problems will be avoided.


Try casting as (unsigned long long*)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜