What does this C code do [Duff's device]? [duplicate]
void Send(int * to, const int* from, const int count)
{
int n = (count+7) / 8;
switch(count%8)
{
case 0: do { *to++ = *from++;
case 7: *to++ = *from++;
case 6: *to++ = *from++;
case 5: *to++ = *from++;
case 4: *to++ = *from++;
case 3: *to++ = *from++;
case 2: *to++ = *from++;
case 1: *to++ = *from++;
} while (--n>0);
}
}
This is Duff's Device. It's a method of unrolling loops that avoids having to add a secondary fix-up loop to deal with times when the number of loop iteration isn't know to be an exact multiple of the unrolling factor.
Since most answers here seem to be generally positive about it I'm going to speak out against it.
With this code a compiler is going to struggle to apply any optimization to the loop body. If you just wrote the code as a simple loop a modern compiler should be able to handle the unrolling for you. This way you maintain readability and performance and have some hope of other optimizations being applied to the loop body.
The Wikipedia article referenced by others even says when this 'pattern' was removed from the Xfree86 source code performance actually improved.
This outcome is typical of blindly hand optimizing any code you happen to think might need it. It prevents the compiler from doing its job properly, makes your code less readable and more prone to bugs and typically slows it down. If you were doing things the right way in the first place, i.e. writing simple code, then profiling for bottlenecks, then optimizing, you'd never even think to use something like this. Not with a modern CPU and compiler anyway.
It's fine to understand it, but I'd be surprised if you ever actually use it.
This mingling of a switch statement and a while loop is called "Duff's Device". It is a way to unroll loops, which was an optimization often used in earlier times.
So this code still copies the memory contents from one place to the other, but it might be more efficient. Beware, on today's architectures you should always measure that, because with cache locality and blindingly fast CPUs loop unrolling is often a bad idea.
This is functionally identical to the code below:
for(int i=0;i<n;i++)
{
*to++=*from++;
}
The difference is that your code unrolls the loop so that only 1 loop iteration is required for each 8 integers copied. Since there are no breaks for any of the cases, execution falls through from each case label to the next.
When count%8==0, 8 copies are executed inside of the loop for the first iteration
when count%8==7, 7 copies are executed for the first iteration
and so forth. After the first iteration with %8 copies, exactly 8 copies happen per iteration.
By unrolling the loop in this manner, the loop overhead is significantly reduced. It's important to note the order of the case values (0,7,6,5,4,3,2,1) which lend themselves to being translated into a jump table by the compiler.
update
An issue with the example code posted by OP is that a count value of 0 will cause 8 copies to take place, potentially resulting in a buffer overflow.
Duff's device
In computer science, Duff's device is an optimized implementation of a serial copy that uses a technique widely applied in assembly language for loop unwinding. Its discovery is credited to Tom Duff in November of 1983, who at the time was working for Lucasfilm. It is perhaps the most dramatic use of case label fall-through in the C programming language to date. Duff does not claim credit for discovering the concept of loop unrolling, just this particular expression of it in C.
精彩评论