开发者

Persistent Hash Table Implementation

In a program I'm working on, I develop a large "thread tree" (at most k children per node), where each thread makes some modifications to a hash t开发者_运维问答able inherited from its parent. Is there a way to implement a hash table that is somewhat "persistent" (in the sense of http://en.wikipedia.org/wiki/Persistent_data_structure) ?

Namely, is there a way to implement a key-value pairing with at least O(log n) lookup, insertion, and deletion that is fully persistent, but is as "space-efficient" (worst-case) as an ordinary hash-table?


"As space-efficient as an ordinary hash-table" is a rather vague specification, since "ordinary" may mean linked or probed depending on who you ask. I don't think anyone has designed easy-to-understand persistent hash tables yet.

The easiest way to get a persistent key-value map with the complexity that you want is to use a persistent binary search tree. Lookup is the familiar algorithm from ephemeral (non-persistent) BSTs. Insert changes however, and becomes something like (pseudo-Java):

// overwrites the value for k if it's already in the tree
Node insert(Node node, Key k, Value v)
{
    if (k < node.key)
        return new Node(node.key, node.val, insert(node.left, k, v), node.right);
    else if (k > node.key)
        return new Node(node.key, node.val, node.left, insert(node.right, k, v));
    else
        return new Node(k, v, node.left, node.right);
}

Note that the insert routine returns a new tree, which may seem inefficient, but it only changes those nodes it traverses. This is on average O(lg n), so it does O(lg n) allocations on average. This is about as space-efficient as it gets.

To get worst-case O(lg n) behavior, use a red-black tree. See also the literature on data structures in functional programming, e.g. the works of Okasaki.


Is there a way to implement a key-value pairing with at least O(log n) lookup, insertion, and deletion that is fully persistent, but is as "space-efficient" (worst-case) as an ordinary hash-table?

Indeed there is. Many ways.

E.g. in Haskell, the simple Data.Map, a size balanced binary trees (or trees of bounded balance) as described by:

  • Stephen Adams, "Efficient sets: a balancing act", Journal of Functional Programming 3(4):553-562, October 1993, http://www.swiss.ai.mit.edu/~adams/BB/.
  • J. Nievergelt and E.M. Reingold, "Binary search trees of bounded balance", SIAM journal of computing 2(1), March 1973.

Provides the following API, satisfying your criteria:

insert :: Ord k => k -> a -> Map k a -> Map k a   -- O(log n)
lookup :: Ord k => k -> Map k a -> Maybe a        -- O(log n)
delete :: Ord k => k -> Map k a -> Map k a        -- O(log n)

while being fully-persistant. Space use is O(n).

For better constant factors, try e.g. a Data.HashMap data structures, with the same overall complexity.

Alternative structures include:

  • persistent tries, which have improved space use over hashtables, as key storage is dense.


Clojure has implemented a whole set of persistent data structures, such as hash maps. It's open source, so maybe you should take a look?

http://clojure.org/data_structures

http://code.google.com/p/clojure/source/browse/trunk/src/jvm/clojure/lang/PersistentHashMap.java


Namely, is there a way to implement a key-value pairing with at least O(log n) lookup, insertion, and deletion that is fully persistent, but is as "space-efficient" (worst-case) as an ordinary hash-table?

Yes. Section 5 of Driscoll et al.'s "Making Data Structures Persistent" shows a technique for making fully persistent red-black trees with O(lg n) time and O(1) space complexity for insertion, deletion, and lookup.

Their data structure is not confluently persistent. For more on persistence, see Kaplan's survey on the topic.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜