开发者

I need a container that supports efficient random access and O(k) insertion and removal

I have tried again to ask the same question, but I ended up asking a different question by not providing essent开发者_如何学Goial information about my problem.

I implement a data structure which is a tree. Each node of this tree has an array/vector/(random access structure) with all its children which can be arbitrary many. The insertion/removal of elements is easy since we always double/divide-by-two the number of the elements in this array.

That is what the O(k) insertion/removal means in this context. We have k elements and we append k more or delete k/2. Rebuilding the whole data structure is fine so far. A dynamic array (or a vector) works.

The operation that raises the question is the following. Sometimes we have to "split" a node having n children, which means we "divide" the children among different nodes. The way we dive is in continuous groups. The most efficient way for this split is, I guess, to have one pointer for each new node at the position where its children are and how many there are (let's say each node takes k children). But then, the number of these children may change and that should not affect its siblings (or even worse, the whole tree) i.e the execution time of an insertion should be O(k) and not O(n). How do we do it?

An easy but inefficient work-around would be that every time we split a node, we replace the "big" array of children with many (as many as the split parts) "small" dynamic arrays.

Each of the following "boxes" is a random access structure.

I need a container that supports efficient random access and O(k) insertion and removal


How about a hashmap?

It's got amortized O(1) access, insertion and removal average case; O(n) worst case.


From the description you gave of the tree structure you are implementing, it might be best to create a new data structure to mimic your tree. Especially if you're already tracking pointers between nodes.

If I understand your statement, each node in your tree would contain a vector of children node pointers. When you need to split the node, you could create new nodes with each one receiving a segment of the vector of children pointers, and the newly created nodes would be inserted into the node vector of the parent node.

for example:

N1->N2->(n3,n4,n5,n6,n7,n8) split N2 into two nodes: N1->(N2_1, N2_2) with N2_1->(n3,n4,n5) and N2_2->(n6,n7,n8)

(Sorry, I don't know how to easily draw trees...)

This way, you are only relinking memory rather than copying, and access will generally be log n. Furthermore, this gives a proper representation of the tree structure in code.

edit adding example:

Suppose again, we have N1->N2->(n3,n4,n5,n6,n7,n8). If N1 should need to have new nodes added, the only affect is on the N1 node: N1->(N2,N9)->(n3,n4,n5,n6,n7,n8)

The structure of a node might be like this (very simplified):

class Node {
  vector<Node *> children;
  Node * parent;
};

The larger tree structure would be many of these nodes all linked together much like a binary tree. To add nodes to any one node on a tree would only add items to the children member of that node. Nothing else is affected.


It looks to me like you're trying to reinvent the B+ Tree...

There is a variation of the B+ Tree which consists in storing the number of sub elements instead of a proper key, this allows efficient random access (log n) which factor you actually control.

For example, a common factor for databases is around 1,000, meaning that at the first level of the tree is the root, the second can contain up to 1,000 children, the third up to 1,000,000 etc...

If you have less than 1,000,000,000 objects, it means at most 3 dereferences, which is quite efficient.

There has been a python module written using this technic, called blist, which aimed at replacing the traditional list class (implemented as a C++ vector) with a B+ Tree to get more efficiency for large items. The performances measurements can be found here.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜