A particular problem with btree insertion
I have been playing with the very cool btree applet at slady.net. I'm having trouble understanding a particular behavior. Take a look at this starting state:
alt text http://www.freeimagehosting.net/uploads/db2931c7da.jpg
This particular state was arrived at by inser开发者_StackOverflowting the following sequence: 10, 15, 30, 16, 70, 1, 9, 27, 45, 50, 55.
My question is regarding what happens to the [45, ] node when I insert the next value in the sequence, 65.
alt text http://www.freeimagehosting.net/uploads/3b70c1d302.jpg
The [55,70] node will be split by the 65, and being the middle value, the 65 will travel back up and then split the [30,50] node as well. My question is: Why does the [45, ] node end up a child of the [30, ] node? It's parent originally had 3 children, the left most and the right most became new seperate nodes. The 45 was between those values and it seems like it could have ended up under the [65, ] node just as well... Why?
A picture is worth a thousand words; an animation is worth a million:
http://constellationmedia.com/~funsite/static/btree-insert-animation.gif
The key thing to notice here is that when the center node 50 is pulled up, it has to throw away its right child because it's too far down. However, 65 needs a new left child, so 50 hands 45 over to 65. 50 now needs a new right child, and the node containing 65 needs to be childed, so it takes the newly formed 65 in its place.
Here are illustrated B-tree insertion rules (where maximum node size is 4 items, 5 child nodes):
http://constellationmedia.com/~funsite/static/btree-insertion-rules.png
The xr
won't exist and won't matter if you're inserting into a leaf (which you do first). However, if you have to split a node in half, the new x
is the center item you pulled out, and the new xr
is the right child of x
.
It makes no sense for the 45 node to be a child of the 65 node in the second diagram as the rightmost branch is for values > 50 (based on the root node's right-most value) - so 45 has to go into the middle branch somewhere.
Each node always has n+1 children, where n is the number of keys in that node.
Before splitting, the [30, 50] node has two keys and three children, as expected. When you split that, you end up with a [30, -] node and a [65, -] node (and push the 50 up a level).
At the next level down, you have the (previously existing) [16, 27] and [45, -], and the newly-split [55, -] and [70, -] nodes.
You have two parent nodes and four children. Each parent must have two children because it has a single key. Therefore, given the ordering rules, [45, -] must be a child of [30, -] because otherwise (1) [30, -] wouldn't have enough children, and (2) [65, -] would have too many children. [EDIT - not illegally too many for a node, but the split is supposed to be balanced].
Will A is also correct. This is a consequence of choosing to push up the 50 key when splitting the middle-layer node, but this wasn't really a choice. Splitting to either produce [-, -] and [50, 65] pushing up 30, or [30, 50] and [-, -] pushing up 65 would violate the rule that every node must be at least half full.
精彩评论