What are real significant cases when memcpy() is faster than memmove()?
The key difference b开发者_如何学编程etween memcpy()
and memmove()
is that memmove()
will work fine when source and destination overlap. When buffers surely don't overlap memcpy() is preferable since it's potentially faster.
What bothers me is this potentially. Is it a microoptimization or are there real significant examples when memcpy()
is faster so that we really need to use memcpy()
and not stick to memmove()
everywhere?
There's at least an implicit branch to copy either forwards or backwards for memmove()
if the compiler is not able to deduce that an overlap is not possible. This means that without the ability to optimize in favor of memcpy()
, memmove()
is at least slower by one branch, and any additional space occupied by inlined instructions to handle each case (if inlining is possible).
Reading the eglibc-2.11.1
code for both memcpy()
and memmove()
confirms this as suspected. Furthermore, there's no possibility of page copying during backward copying, a significant speedup only available if there's no chance for overlapping.
In summary this means: If you can guarantee the regions are not overlapped, then selecting memcpy()
over memmove()
avoids a branch. If the source and destination contain corresponding page aligned and page sized regions, and don't overlap, some architectures can employ hardware accelerated copies for those regions, regardless of whether you called memmove()
or memcpy()
.
Update0
There is actually one more difference beyond the assumptions and observations I've listed above. As of C99, the following prototypes exist for the 2 functions:
void *memcpy(void * restrict s1, const void * restrict s2, size_t n);
void *memmove(void * s1, const void * s2, size_t n);
Due to the ability to assume the 2 pointers s1
and s2
do not point at overlapping memory, straightforward C implementations of memcpy
are able to leverage this to generate more efficient code without resorting to assembler, see here for more. I'm sure that memmove
can do this, however additional checks would be required above those I saw present in eglibc
, meaning the performance cost may be slightly more than a single branch for C implementations of these functions.
At best, calling memcpy
rather than memmove
will save a pointer comparison and a conditional branch. For a large copy, this is completely insignificant. If you are doing many small copies, then it might be worth measuring the difference; that is the only way you can tell whether it's significant or not.
It is definitely a microoptimisation, but that doesn't mean you shouldn't use memcpy
when you can easily prove that it is safe. Premature pessimisation is the root of much evil.
Well, memmove
has to copy backwards when the source and destination overlap, and the source is before the destination. So, some implementations of memmove
simply copy backwards when the source is before the destination, without regard for whether the two regions overlap.
A quality implementation of memmove
can detect whether the regions overlap, and do a forward-copy when they don't. In such a case, the only extra overhead compared to memcpy
is simply the overlap checks.
Simplistically, memmove
needs to test for overlap and then do the appropriate thing; with memcpy
, one asserts that there is not overlap so no need for additional tests.
Having said that, I have seen platforms that have exactly the same code for memcpy
and memmove
.
It's certainly possible that memcpy
is merely a call to memmove
, in which case there'd be no benefit to using memcpy
. On the other extreme, it's possible that an implementor assumed memmove
would rarely be used, and implemented it with the simplest possible byte-at-a-time loops in C, in which case it could be ten times slower than an optimized memcpy
. As others have said, the likeliest case is that memmove
uses memcpy
when it detects that a forward copy is possible, but some implementations may simply compare the source and destination addresses without looking for overlap.
With that said, I would recommend never using memmove
unless you're shifting data within a single buffer. It might not be slower, but then again, it might be, so why risk it when you know there's no need for memmove
?
Just simplify and always use memmove
. A function that's right all the time is better than a function that's only right half the time.
It is entirely possible that in most implementations, the cost of a memmove() function call will not be significantly greater than memcpy() in any scenario in which the behavior of both is defined. There are two points not yet mentioned, though:
- In some implementations, the determination of address overlap may be expensive. There is no way in standard C to determine whether the source and destination objects point to the same allocated area of memory, and thus no way that the greater-than or less-than operators can be used upon them without spontaneously causing cats and dogs to get along with each other (or invoking other Undefined Behavior). It is likely that any practical implementation will have some efficient means of determining whether or not the pointers overlap, but the standard doesn't require that such a means exist. A memmove() function written entirely in portable C would on many platforms probably take at least twice as long to execute as would a memcpy() also written entirely in portable C.
- Implementations are allowed to expand functions in-line when doing so would not alter their semantics. On an 80x86 compiler, if the ESI and EDI registers don't happen to hold anything important, a memcpy(src, dest, 1234) could generate code:
mov esi,[src] mov edi,[dest] mov ecx,1234/4 ; Compiler could notice it's a constant cld rep movsl
This would take the same amount of in-line code, but run much faster than:push [src] push [dest] push dword 1234 call _memcpy ... _memcpy: push ebp mov ebp,esp mov ecx,[ebp+numbytes] test ecx,3 ; See if it's a multiple of four jz multiple_of_four multiple_of_four: push esi ; Can't know if caller needs this value preserved push edi ; Can't know if caller needs this value preserved mov esi,[ebp+src] mov edi,[ebp+dest] rep movsl pop edi pop esi ret
Quite a number of compilers will perform such optimizations with memcpy(). I don't know of any that will do it with memmove, although in some cases an optimized version of memcpy may offer the same semantics as memmove. For example, if numbytes was 20:
; Assuming values in eax, ebx, ecx, edx, esi, and edi are not needed mov esi,[src] mov eax,[esi] mov ebx,[esi+4] mov ecx,[esi+8] mov edx,[esi+12] mov edi,[esi+16] mov esi,[dest] mov [esi],eax mov [esi+4],ebx mov [esi+8],ecx mov [esi+12],edx mov [esi+16],edi
This will work correctly even if the address ranges overlap, since it effectively makes a copy (in registers) of the entire region to be moved before any of it is written. In theory, a compiler could process memmove() by seeing if treading it as memcpy() would yield an implementation that would be safe even if the address ranges overlap, and call _memmove in those cases where substituting the memcpy() implementation would not be safe. I don't know of any that do such optimization, though.
精彩评论