开发者

How is Array.Copy implemented in C#?

I tried to look at the implementation of Array.Copy in C# with ILSpy but it didn't show me the implementation itself.

I wrote a simple benchmark, Array.Copy vs a simple for loop to copy the data. Array.Copy was faster.

How is it implemente开发者_如何转开发d faster?

Thanks, Shay


Disassembling the Array class will land you on this declaration:

[MethodImpl(MethodImplOptions.InternalCall), SecurityCritical, ReliabilityContract(Consistency.MayCorruptInstance, Cer.MayFail)]
internal static extern void Copy(Array sourceArray, int sourceIndex, Array destinationArray, int destinationIndex, int length, bool reliable);

The [MethodImpl] attribute tells the JIT compiler that the method is actually implemented in the CLR, written in C++ instead of a managed language. It looks in a table of method names and retrieves a pointer to the C++ function that implements the method and compiles it to a simple CALL instruction.

Getting the source code for the CLR is a bit tricky, but the SSCLI20 version is pretty accurate for methods that have been around for a long time and didn't require tweaking. Array.Copy() certainly qualifies. The table I mentioned is defined in clr\src\vm\ecall.cpp, the section that's relevant to your question looks like this:

FCFuncStart(gArrayFuncs)
    FCFuncElement("Copy", SystemNative::ArrayCopy)
    FCFuncElement("Clear", SystemNative::ArrayClear)
    FCFuncElement("get_Rank", Array_Rank)
    //  etc...

The SystemNative::ArrayCopy() function pointer takes you to clr\src\vm\comsystem.cpp. The actual function is too big to copy here without making your eyes glaze over, there is a lot of error checking going on. It looks for a way to optimize the copy, the happy case is where the elements of the array can simply be copied without being converted. That is done by a function named m_memmove(). You'll find that function in the same file, it is used in the 32-bit version of the CLR.

Which first copies a single byte at a time until the destination address is aligned on a multiple of 4 bytes. Then it copies 16 bytes at a time, 4 times 4, these copies are fast because they are aligned. Then it copies what's left one byte at a time.

You can perhaps now see why it can be faster than your own loop. It can move 4 bytes at a time even if the array element size is not 4 bytes wide. And it can do so while ensuring the copy address is aligned, you can't since the physical address of the array element isn't discoverable.


Same techniques used to write a fast memcpy function:

  • loop unrolling
  • transfer of aligned data in large chunks (often using SIMD)
  • CPU caching hints (SIMD helps here as well)

See also:

  • Very fast memcpy for image processing?
  • How to increase performance of memcpy
  • does rewriting memcpy/memcmp/... with SIMD instructions make sense
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜