string[x] vs *string++
Which of the 2 methods is theoretically faster and why ? (The pointer to string must be constant.)
What is the exact difference between destination[count] and *destination++ ? Does destination[count] keep moving from 开发者_如何学Go0 to count on every call ? Does *destination++ just add 1 on each call ?
char *const string = "Hello world!";
char *destination = malloc(strlen(string) + 1);
int count = 0;
while(string[count] != '\0')
{
destination[count] = string[count];
count++;
}
char *const string = "Hello world!";
char *destination = malloc(strlen(string) + 1);
char *ptr = string;
while(*ptr != '\0')
{
*destination++ = *ptr++;
}
It depends on the compiler. With modern compilers it's really not possible to predict anything about optimization on the single-instruction level without actually looking at the generated code. Both probably compile to equivalent instructions.
That said, destination[count]
does not loop from 0 to count
on each call. It takes the memory location of destination[0]
, adds count
times the size of the type of *destination
, and looks there. (This is all naively speaking, of course - the compiler probably does something faster.)
On the other hand, *destination++
takes the memory location of destination[0]
(which is not the beginning of the array anymore, since you've changed destination
), looks there, and adds the size of the type of *destination
so that destination[0]
refers to what used to be destination[1]
.
Why should we speculate? We can try it and find out. I compiled the code with gcc -O3 -g
(on x86) and disassembled the result. There were more changes than I expected, so I'll focus in on the bits in the middle where we'd expect the bulk of the differences between the two to be.
The core of the loop in the first case:
0x00000030 <foo+48>: mov %dl,(%edi,%esi,1)
0x00000033 <foo+51>: movzbl 0x1(%ecx),%edx
0x00000037 <foo+55>: inc %eax
0x00000038 <foo+56>: inc %ecx
0x00000039 <foo+57>: mov %eax,%esi
0x0000003b <foo+59>: test %dl,%dl
0x0000003d <foo+61>: jne 0x30 <foo+48>
The core of the loop in the second case:
0x00000080 <foo2+48>: mov %dl,(%eax)
0x00000082 <foo2+50>: movzbl 0x1(%ecx),%edx
0x00000086 <foo2+54>: inc %eax
0x00000087 <foo2+55>: inc %ecx
0x00000088 <foo2+56>: test %dl,%dl
0x0000008a <foo2+58>: jne 0x80 <foo2+48>
On that basis, the second is perhaps a little faster. But really, it won't make much difference in practice. The L1 cache holds both loops just fine and the target memory is uncached so the differences will be moot. Good luck with ever actually measuring a difference between the two.
In the good old days, the second method is faster. But with the modern compiler optimization, they are just the same.
This is because:
destination[count] = string[count];
would be doing
*(destination+count) = *(string+count);
It take two more add
operation in each loop. Modern compiler would just do this for you.
Depends on the CPU. On a x86, the first version is slightly faster (or, at least, it will result in a slightly shorter code), because it only takes one instruction to load from string[count] and one instruction to write into destination[count]. So the pseudocode would look sort of like this (each line = one assembly instruction)
register = *(string+count)
*(destination+count) = register
count += 1
compare register, 0
jump to line 1 if nonzero
and the second version would be
register = *ptr
*destination = register
ptr += 1
destination += 1
compare register, 0
jump to line 1 if nonzero
In practice, any optimizing compiler will optimize the heck out of this code, and, depending on its sophistication, the compiler may even be able to turn the first version into the second or vice versa. So there's no telling which one is going to be faster.
If "string" arives as an argument in a function, you could use that as an iterator instead of "ptr". Then you'll have one less variable to hold on the stack. That could possibly mean that the compiler can keep all local variables in registers, and that would improve performance.
That of course depends, if the compiler manages to do away with "count" you wont get any performance increments.
destination[count]
is the count-th element (0 based) in destination[]
. count++
ensures that in each time the loop is entered, destination[count]
is one past the last element accessed in previous iteration (except, of course, the first entrance).
*destination++ = *ptr++;
is same as the following:
*destination = *ptr;
destination++;
ptr++;
Note that destination++
increments the pointer value by sizeof(*destination)
, which will point to next element in array, not the data pointed by destination
(or *destination
).
For modern compilers, both will be optimized to almost same binary, so it doesn't matter which one you choose with respect to speed.
Looking at it from a high level, both operations involve a dereference and an addition. The expression destination[count]
is evaluated as *(destination + count)
. The expression *destination++
is (most likely) evaluated as *destination; destination += 1
(bearing in mind that the side effect only needs to be applied before the next sequence point, which is not necessarily immediately after the expression is evaluated).
So theoretically there shouldn't be that much difference between the two versions. The only way to know for your particular platform is to code up both versions and compare their performance.
*destination++
actually adds sizeof(*destination)
at each call (it depends on the pointer size) while destination[count]
should be doing *(destination + count * sizeof(*destination))
but it's probably optimized by the compiler anyway.
精彩评论