Game of Life memory optimization
Hi I have written a simple game of life code in that uses 2 arrays, one which maintains current state and another which mainta开发者_Go百科ins next state. Can anyone suggest how to optimize this program for memory consumption. Ideally, I would like its space complexity to be less than RC.
It depends on how sparse your playing field is. If your playing field is dense, then the space complexity of any storage algorithm will trend towards RC. (Specifically, RC/2, since once you get more active cells than inactive cells, you can just store the inactive cells instead if you really cared that much about optimal space usage.)
If the playing field is sparse, you can get something that scales instead with the number of active cells by simply storing coordinate pairs per active cell or some other sparse matrix structure.
I don't know too much about GOL, but assuming that there are many empty 'squares', have you looked at Sparse Matrixes?
Late answer, but still.
Check out the sourcecode for golly: http://golly.sourceforge.net/
But before you do that, go to: http://www.ibiblio.org/lifepatterns/lifeapplet.html
And have a look at the code there. It's in Java, but it's very very fast.
This quote from Alan Hensel's website should answer your question:
How did you make it go so fast? Okay, the casual observer might not notice that my applet is blazingly fast. You might not have found the "Warp Speed" button, or you might not have used it, or you might not have been impressed with it. You can skip to the next section.
However, some of you have asked, how the heck did you make it go so fast?! To the curious, or to those thinking of writing their own super-fast cellular automata program, I will explain.
I tend to think of cellular automata optimization as being related to data compression. This is also a simple concept with no simple solution, and what solutions are best depends on the type of data being processed. In Conway's Life, patterns tend to be blobby.
For blobby universes, one should probably consider dividing the universe up into blocks approximately the size of the blobs. For Life, 4x4 to 8x8 seem reasonable. I chose the upper bound, 8x8, for reasons of convenience: There happen to be 8 bits in a byte. I strongly considered 4x4, but it didn't work out as nice....
You should put the blocks in some kind of list, so that you waste zero time in the empty parts of the universe.
Already, note a complication: New elements in the list must be introduced if the pattern grows over a block's boundaries, but we have to know if the block's neighbor already exists. You can either do a simple linear search of the list, or binary search, or keep some kind of map. I chose to make a hash table. This is solely used for finding the neighbors of a new block; each existing block already keeps a pointer to its neighbors, as they will be referenced often.
There must also be an efficient algorithm within the blocks. I chose to primarily blaze straight thru each block. There are no inner loops until all cells in a block are processed. Also, fast-lookup tables are employed. I look up 4x4 blocks to determine the inner 2x2.
Note: CA programs typically consist of 2 main loops (plus a display loop), because CA rules operate on the cells in parallel, while the microprocessor is conceptually serial. This means that there must be two copies of the universe, effectively, so that no important info is destroyed in the process of creating the next generation. Often these 2 copies are not symmetrical. It was a great struggle for me, since almost every time I took something out of one loop to make it faster, I had to add something else to the other loop! Almost every time, that is; the exceptions to that rule lead to the best optimizations. In particular, there are good tradeoffs to be considered in bit-manipulations: shifting, masking, recombining to form an address in the lookup table....
It can also be considered that sometimes the contents of a block may stabilize, requiring no further processing. You could take the block out of the list, putting it in a "hibernation" state, only to be re-activated if a neighboring block has some activity spilling into it. These blocks would take zero processing time, just like a blank region of the universe.
Period-2 oscillators might also not be very difficult to detect, and remove from the processing time. This might be worthwhile in Life, because the blinker is the most common kind of random debris. Higher period oscillators are much more rare. It is also possible that gliders could be detected and simulated. You will get diminishing returns from this kind of optimization, unless you take it to an extreme (cf. HashLife).
Also, a block of cells that's completely empty might not be worth deallocating and removing from the hash table for a while. That takes some processing time, which could be significant in the case of an oscillator moving in and out of its space repeatedly. Only when memory gets low should the oldest blocks from the "morgue" be recycled.
When the program is fast enough, it should be considered that it isn't worth displaying generations any faster than the eye can see, or at least not much faster than the refresh rate of the monitor. Especially in windowed environments, display time can be a real bottleneck.
Then read this piece of sourcecode: http://www.ibiblio.org/lifepatterns/src.41d/LifeCell.java
It contains the datastructures that hold Alan's Life universe.
Forget about hashlife, it is so complex it will make your head spin.
The other answerers have a good point in looking for other data structures for your states. But regardless of that, a simple optimisation might be working with two pointers to states (you might do this already). So instead of doing two array copies you do one copy and three pointer assigments:
state curr, next;
[...]
for (;;) {
next = advance(curr);
curr = copy(next);
}
To this:
state s1, s2;
state *curr, *next;
[...]
for (;;) {
*next = advance(*curr);
/* swap pointers */
state *tmp = curr;
curr = next;
next = tmp;
}
I recommend sparse matrices, as nonnb and Amber recommended. You could also look into encoding, if you have processor power to burn, or want to serialize to disk. RLE or token based compression might get you somewhere.
精彩评论