Delphi Performance: Case Versus If
I guess there might be some overlapping with previous SO questions, but I could not find a Delphi-specific question on this topic.
Suppose that you want to check if an unsigned 32-bit integer variable "MyAction" is equal to any of the constants ACTION1, ACTION2, ..., ACTIONn, where n is - say 1000. I guess that, besides being more elegant,
case MyAction of
ACTION1: {code};
ACTION2: {code};
...
ACTIONn: {code};
end;
is much faster than
if MyAction = ACTION1 then
// code
else if MyAction = 开发者_C百科ACTION2 then
// code
...
else if MyAction = ACTIONn then
// code;
I guess that the if variant takes time O(n) to complete (i.e. to find the right action) if the right action ACTIONi has a high value of i, whereas the case variant takes a lot less time (O(1)?).
- Am I correct that switch is much faster?
- Am I correct that the time required to find the right action in the switch case actually is independent of n? I.e. is it true that it does not really take any longer to check a million cases than to check 10 cases?
- How, exactly, does this work?
The compiler will translate a case statement into one of:
- Two-level table, using one table to map the value to an index, and using the index to select an address from a jump table
- Indirect jump through table
- Sequential jumps
- Binary search - this is recursive, so leaves of the binary search may use any of 2, 3 or 4.
It uses heuristics such as the number of cases, the range of cases, the number of distinct alternatives (each alternative may implement a range of different values), etc.
The intuition for the case statement is that it is an O(1)
operation.
It is always good to check the reality first ...
Delphi 2010 compiler seems to like test-and-branch a lot. For example, the following simple code is not compiled into a branch table.
var
c: (aaa, bbb, ccc);
begin
case c of
aaa: sleep(0);
bbb: sleep(0);
ccc: sleep(0);
end;
end.
Compiler will generate the following code:
Project56.dpr.24: case c of
0040A1C4 0FB6053C0E4100 movzx eax,[$00410e3c]
0040A1CB 2C01 sub al,$01
0040A1CD 7208 jb $0040a1d7
0040A1CF 740F jz $0040a1e0
0040A1D1 FEC8 dec al
0040A1D3 7414 jz $0040a1e9
0040A1D5 EB19 jmp $0040a1f0
Project56.dpr.25: aaa: sleep(0);
0040A1D7 6A00 push $00
0040A1D9 E86EDAFFFF call Sleep
0040A1DE EB10 jmp $0040a1f0
Project56.dpr.26: bbb: sleep(0);
0040A1E0 6A00 push $00
0040A1E2 E865DAFFFF call Sleep
0040A1E7 EB07 jmp $0040a1f0
Project56.dpr.27: ccc: sleep(0);
0040A1E9 6A00 push $00
0040A1EB E85CDAFFFF call Sleep
Even more complicated cases will get compiled into a test-and-jump series. For example ...
var
c: (aaa, bbb, ccc, eee, fff, ggg, hhh);
begin
case c of
aaa: sleep(0);
bbb: sleep(0);
ccc: sleep(0);
hhh: sleep(0);
end;
end.
... is compiled into ...
Project56.dpr.24: case c of
0040A1C4 0FB6053C0E4100 movzx eax,[$00410e3c]
0040A1CB 2C01 sub al,$01
0040A1CD 720C jb $0040a1db
0040A1CF 7413 jz $0040a1e4
0040A1D1 FEC8 dec al
0040A1D3 7418 jz $0040a1ed
0040A1D5 2C04 sub al,$04
0040A1D7 741D jz $0040a1f6
0040A1D9 EB22 jmp $0040a1fd
Project56.dpr.25: aaa: sleep(0);
0040A1DB 6A00 push $00
0040A1DD E86ADAFFFF call Sleep
0040A1E2 EB19 jmp $0040a1fd
Project56.dpr.26: bbb: sleep(0);
0040A1E4 6A00 push $00
0040A1E6 E861DAFFFF call Sleep
0040A1EB EB10 jmp $0040a1fd
Project56.dpr.27: ccc: sleep(0);
0040A1ED 6A00 push $00
0040A1EF E858DAFFFF call Sleep
0040A1F4 EB07 jmp $0040a1fd
Project56.dpr.28: hhh: sleep(0);
0040A1F6 6A00 push $00
0040A1F8 E84FDAFFFF call Sleep
The most probably cause cause for such code is that jump tables don't play very well with L1 caches and because of that test-and-jump version may be faster if there is not a large number of case labels present.
A "proof" for that reasoning is the following program which does get translated into a jump table.
var
b: byte;
begin
case b of
0: sleep(0);
1: sleep(0);
2: sleep(0);
3: sleep(0);
4: sleep(0);
5: sleep(0);
6: sleep(0);
7: sleep(0);
8: sleep(0);
9: sleep(0);
10: sleep(0);
11: sleep(0);
12: sleep(0);
13: sleep(0);
14: sleep(0);
15: sleep(0);
16: sleep(0);
17: sleep(0);
18: sleep(0);
19: sleep(0);
20: sleep(0);
end;
end.
Project56.dpr.12: case b of
0040A178 0FB6C0 movzx eax,al
0040A17B 83F814 cmp eax,$14
0040A17E 0F8728010000 jnbe $0040a2ac
0040A184 FF24858BA14000 jmp dword ptr [eax*4+$40a18b]
...
Project56.dpr.14: 1: sleep(0);
0040A1EB 6A00 push $00
0040A1ED E85ADAFFFF call Sleep
0040A1F2 E9B5000000 jmp $0040a2ac
Project56.dpr.15: 2: sleep(0);
0040A1F7 6A00 push $00
0040A1F9 E84EDAFFFF call Sleep
0040A1FE E9A9000000 jmp $0040a2ac
Project56.dpr.16: 3: sleep(0);
0040A203 6A00 push $00
0040A205 E842DAFFFF call Sleep
0040A20A E99D000000 jmp $0040a2ac
...
Barry could give us a definite answer to that question. I'm just testing and rambling.
- Yes the switch is O(1) whereas the cascading if's are O(n)
- Yes, see (1)
- With a branch table
Note that if the value of MyAction is weighted, good performance can be obtained with the cascading if..else, where you put the most likely cases near the top. I'm not saying that it will compete with the case/switch statement in terms of performance when you're dealing with integers. But if a case isn't suitable (suppose you've got strings, for example), put your high percentage tests up at the top.
精彩评论