Cache-oblivious data structures and dynamic languages - effective?
I've been reading recently about cache-oblivious data structures like auxiliary buffer heaps. These data structures work by keeping their most-recently-accessed elements in cache memory so any subsequent access is also quicker.
Most of these data structures are implemented with a low-level language like C/C++. Is it worth it to 开发者_如何学编程try to port these data structures over to a dynamic language like Python, or does the overhead of running on a virtual machine ruin all the performance benefits of these data structures? It seems like the latter, but I thought I would ask to see if someone actually has some experience with it.
In C (or C++) you have fine-grained control over the exact size of each data structure. You also have the possibility of fine-grained control over storage allocation. You can, after all, extend the new
method, use malloc
directly and otherwise structure memory to create spatial locality.
In most dynamic languages (like Python) you have no control at all over the exact size of anything, much less it's location.
In Python you may have some temporal locality, but you have little or no spatial locality.
Temporal locality might be enhanced through simple memoization of results. This is such a common speed-up that people often include a memoization decorator to uncouple the memoization (temporal locality) from the core algorithm.
I don't think that C or C++ cache-oblivious implementations translate to dynamic languages because I don't think you have enough control. Instead, just exploit memoization for speed-ups.
http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize
One of the major points of cache oblivious algorithms is that the size of the object does not really matter (because you don't know the size of the next level of memory paging anyway), so the inability to know the exact size is not important. At some point, the size of more than 1 object "fits" into the next memory block size. (Note: not knowing the size of the object is a significant problem for cache aware implementations).
Additionally, most VM memory allocators will allocate at the end of a generation space so as long as you allocate all of your objects at the same time, you could start out ok. Unfortunately, some cache oblivious algorithms assume that you can change the memory layout which is usually impossible with a VM.
Another big problem, is that most VM based implementations use references for objects. So an object with three strings in it is actually 4 objects (the object itself, and 3 string objects). Unless those four objects are allocated next to each other, the analysis of the algorithm breaks down.
Add in compacting garbage collectors and other "optimizations" that VMs are free to do and you have a significant gap between the control you need and the current state of research on these algorithms.
精彩评论