How to debug a monte carlo simulation faster?
I'm writing a simulated annealing program, and having some trouble debugging it. Any advice would be welcome.
First of all, the output is not deterministic, so I've taken to running it a hundred times and looking at the me开发者_StackOverflow社区an and standard deviation.
But then it takes ages and ages (> 30 minutes) to finish a single test case.
Usually, I'd try to cut the input down, but reducing the number of iterations directly decreases the accuracy of the result in ways which are not entirely predictable. For example, the cooling schedule is an exponential decay scaled to the number of iterations. Reducing the number of separate runs makes the output very unreliable (one of the bugs I'm trying to hunt down is an enormous variance between runs).
I know premature optimization is the root of all evil, and surely it must be premature to optimize before the program is even correct, but I'm seriously thinking of rewriting this is a faster language (Cython or C) knowing I'd have to port it back to Python in the end for submission.
So, is there any way to test a simulated annealing algorithm better than what I have now? Or should I just work on something else between tests?
Disclosure: this is an assignment for coursework, but I'm not asking you to help me with the actual simulation.
Here are a couple of debugging tricks I learned from implementing meta-heuristics (like Simulated Annealing and Tabu Search) in Drools Planner (java, open source):
- It supports different
environmentMode
s (DEBUG
,REPRODUCIBLE
(default) andPRODUCTION
). In modeDEBUG
andREPRODUCIBLE
, all code uses the same Random instance and theRandom
instanceseed
is fixed. Therefor running a tabu search implementation twice gives exactly the same moves, steps and resulting score. The simulated annealing implementation relies on the time gradient, so depending on the CPU at the time, there might be a slight difference. Note: this doesn't excuse you from statistical runnings, but it does make a single run reproducible. - Good, readable logging. Use the logging levels. Do not log too verbose.
- Our build server (Hudson) keeps performance statistics.
- There is a
Benchmarker
tool which output diagrams to make it easier to see what the algorithm did. So don't just look at the result, but also look at how it got there. It might have done great at first, but then got stuck in a local optima later.
It's things that like which give you insight on what's actually going on in the algorithm.
Also, see my blog article on Simulated Annealing.
精彩评论