开发者

Unit Testing Approximation Algorithms

I'm working on an open-source approximation algorithms library for graphs and networks using some popular python packages as a base. The main goal is to encompass up-to-date approximation algorithms for NP-Complete problems over graphs and networks. The reason for this is 1) I haven't seen a nice (modern) consolidated package that covers this and 2) it would be a nice pedagogical tool for learning about approximation algorithms on NP-Hard op开发者_如何学编程timization problems.

In building this library I am using unit-tests to sanity check (as any proper developer would). I am somewhat cautious about my unit tests in that by their very nature, approximation algorithms may not return the correct solution. Currently I am solving some small instances by hand and then assuring that the returned result matches that, but this is not desirable, nor scalable in an implementation sense.

What would be the best way to unit test approximation algorithms? Generate random instances and ensure that the returned results are less than the bound guaranteed by the algorithm? That would seem to have false positives (the test just got lucky that time, not guaranteed for all instances to be below bound).


You need to separate two concerns here. The quality of your approximation algorithms and the correctness of implementation of those algorithms.

Testing the quality of an approximation algorithm usually will not lend itself to unit testing methods used in software development. For example you would need to generate random problems that is representative of the real sizes of problems. You might need to do mathematical work to get some upper/lower bound to judge the quality of your algorithms for unsolvable large instances. Or use problem test sets that have known or best known solutions and compare your results. But in any case unit testing would not help you much in improving the quality of the approximation algorithms. This is where your domain knowledge in optimization and math will help.

The correctness of your implementation is where unit tests will be really useful. You can use toy sized problems here and compare known results (solving by hand, or verified through careful step by step debugging in code) with what your code generates. Having small problems is not only enough but also desirable here so that tests run fast and can be run many times during development cycle. These types of tests makes sure that overall algorithm is arriving at the correct result. It is somewhere between a unit test and an integration tests since you are testing a large portion of the code as a black box. But I have found these types of tests to be extremely useful in optimization domain. One thing I recommend doing for this type of testing is removing all randomness in your algorithms through fixed seeds for random number generators. These tests should always run in a deterministic way and give exactly the same result 100% of the time. I also recommend unit testing at the lower level modules of your algorithms. Isolate that method that assigns weights to arcs on the graph and check if the correct weights are assigned. Isolate your objective function value calculation function and unit test that. You get my point.

One other concern that cuts both of these slices is performance. You cannot reliably test performance with small toy problems. Also realizing a change that degrades performance significantly for a working algorithm quickly is very desirable. Once you have a running version of your algorithms you can create larger test problems where you measure the performance and automate it to be your performance/integration tests. You can run these less frequently as they will take more time but at least will notify you early of newly introduced performance bottlenecks during refactoring or new feature additions to algorithms


Checking the validity of the produced solutions is the obvious first step.

Additionally, one angle of attack could be regression testing using instances for which the expected approximate solution is known (e.g. obtained by executing the algorithm by hand or by using somebody else's implementation of the same algorithm).

There also exist repositories of problem instances with known (optimal) solutions, such as TSPLIB for TSP-like problems. Perhaps these could be put to some use.

If there are known upper bounds for the algorithm in question, then generating many random instances and verifying the heuristic solutions against the upper bounds may prove fruitful. If you do do this, I'd urge you to make the runs reproducible (e.g. by always using the same random number generator and seed).

One final note: for some problems, fully random instances are on average pretty easy to find good approximate solutions for. Asymmetric TSP with uniformly and independently chosen arc weights is one such example. I am mentioning this since it may affect your testing strategy.


There is usually something you can check - for instance, that your algorithm always returns solutions that satisfy their constraints, even if they are not optimal. You should also put in assertion checks at every possible opportunity - these will be specific to your program, but might check that some quantity is conserved, or that something that should increase or at worst stay the same does not decrease, or that some supposed local optimum really is a local optimum.

Given these sorts of checks, and the checks on bounds that you have already mentioned, I favour running tests on a very large number of randomly generated small problems, with random seeds chosen in such a way that if it fails on problem 102324 you can repeat that failure for debugging without running through the 102323 problems before it. With a large number of problems, you increase the chance that an underlying bug will cause an error obvious enough to fail your checks. With small problems, you increase the chance that you will be able to find and fix the bug.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜