开发者

Can the bigO of an algorithm be found programmatically by analyzing its perfs?

Note that I don't have a "problem" and I'm not looking for "another way to find the big O of my algorithm".

What I'd like to know is if it would be possible write a program to which you'd pass data points that would all be perfs measurements of an algorithm for various input size: (n,time taken to solve problem for n) and that would then determine the complexity of your algorithm.

For example here's what the input could be (it could be much larger, it's really just an example, that's not the point of the question):

    36 000 took 16 ms
   109 000 took 21 ms
   327 000 took 68 ms
   984 000 took 224 ms
 2 952 000 took 760 ms
 8 857 000 took 2305 ms
26 571 000 took 7379 ms
79 716 000 took 23336 ms

Using such kind of data, is it possible to write a program that would开发者_JAVA百科 tell if we have, say, an O(n), log(n), n log(n) or n! algo?


What you are looking for is Curve fitting. All the simple algorithms for this problem that I know of will try to fit the data points into some sort of polynomial, but I suspect there're those that will be able to differentiate between polynomials and non-polynomials too.


You can use curve fitting (see @Max S.) for determining the formula that describes your data. However, this just half of the story, as there is no way to know if the data describes your algorithm to its full extent.

For instance, your algorithm may present linear behavior for n < 1,000,000,000 and then start behaving quadratically. If you don't have data point where n > 1,000,000,000 then your analysis program will not be able to give you a correct answer.

Thus, to conclude you can do it programmatically, but the results will limited to the data points in your sample. And there is no algorithmic way to determine if the sample sufficiently covers all "interesting" points.


If you're trying to estimate big-O empirically, you must be very careful to ensure that you're testing on a wide range of instances at each size. Remember that big-O is a worst-case notion. It is not uncommon to find algorithms which perform well in nearly all but a few pathological cases, but it is exactly those pathological cases which determine the big-O time. That is, if you miss the pathological cases in your sampling, you could come away with the idea that an O(2^n) algorithm is O(n) instead.

If what you really need is the big-O time, and not just an idea of average performance, then I recommend proving it analytically. Without doing that, you cannot be certain that you haven't missed some pathological input.


I think you could approximate it using regressions, but not get exact results. That's because most algorithms have different performance depending on what the input is (not just the size). So to understand this fully, you would need the source.


Most big-O assumme a idealized machine with infinite memory with uniform time to access, no influence of other applications, etc etc. Especially when you go over thresholds like cache sizes, main memory sizes (paging in/out of the swapfile) can have a significant influence on the performance. So what you determine is how the algorithm is performing in a real world and not it's idealized runtime.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜