开发者

measuring real running time of an algorithm

Approximately, how many physical instructions of MIPS does an abstract algorithm operation amortize to? As for an abstract algorithm operation, I means a basic operation, such as add, divide, etc.

I see 开发者_开发技巧this is not a strict measuring technique :-)

Kejia


There is a list of the basic MIPS instructions here. Most of the "basic operations" that you mentioned are a single MIPS instruction or perhaps two, which probably holds true on most current CPU families.

However this does not take into account at all the architecture and performance characteristics of any of the modern CPUs. Different instructions often have diffrent completion times. Current CPUs usually implement branch prediction, instruction pipelines, memory caching, parallelisation and a whole list of other techniques to make the code execution faster.

Therefore just having the assembly code implementation of an algorithm says nothing about its execution speed. You would have to measure and profile the code on the actual hardware to obtain comparable results. In fact, some algorithms may be far more effective on certain CPUs, even within the same CPU family.

A common and rather understandable example is the effect of the instruction cache. Unrolling a loop will eliminate a number of branch operations, which intuitively makes code faster. If you run that code on a CPU of the same family with very little instruction cache memory, though, the added accesses to the main memory can make it far slower than the simple branch-based loop.


Computers are complicated. If you want to get down to this level you need to start considering what kind of CPU you are using, how well your compiler can use this CPU's instruction set, what variables are being kept in what registers, what are their bit-level representations, etc. Even then, the number of instructions not always easily maps to the actual running time. Different instructions can take different ammounts of clock cycles to execute and this is not even thinking about OS threading and your program's cache miss rate.

In the end, there is a good reason we use big-O notatoin in the first place :)


BTW, most simple operations (add, subtract) on integers should map to a single machine instruction, in case you are worried.


It depends on the CPU architecture. Some processors requires several cycles for a single instruction such as divivide, while others manage to execute all machine code instructions in a single cycle each.

It is sometimes relevant to measure an algorithm in how many floating point operations it requires. However this does not take I/O (such as reading memory) into consideration.

The speed of a CPU is sometimes provided in FLOPS (Floating Point OPerations per Second) which could help to give you a time estimate. Again, not taking I/O into consideration - and not multi-threading issues (also a very important measuring factor).


Donald Knuth addressed this very problem in writing Volume 1 of "The Art of Computer Programming". In the preface he gives a lengthy justification for presenting algorithms in the assembly code for an imaginary machine -

... To avoid this dilemma, I have attempted to design an "ideal" computer called "MIX," with very simple rules of operation ...

That way, one can talk sensibly about how many "cycles" an algorithm would take, without having to care about differences between machines, caching, latency, pipelines, or any of the other ways computers have been optimized to save time, at the expense of knowing how long they will take.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜