Do btrees and b+trees only store data at the leafs?
Do b trees and b+ t开发者_如何学Gorees only store data at their leafs? I am assuming that they use their internal nodes to search the required data.
Is that the case or do they store data in every node?
Non-leaf nodes "records" contain
- a pointer (a node "address" of sorts) to a node in the next level down the tree
- the value of the key of the first (or the last, depending on implementation) record in that node
Such non-leaf "records" are listed in key order so that by scanning (or binary searching within) a non-leaf node, one knows which node in the next level down may contain the searched value.
Leaf nodes records contain complete data records: the key value and whatever else.
Therefore "real" data is only contained in the leaf nodes, the non-leaf nodes only contain [a copy of] the key values. for a very small proportion of the data (this proportion depends on the average number of data records founds in a leaf node).
This is illustrated in the following image from the Wikipedia article on B+ Trees
The non-leaf node, at the top, (the only one in this simplistic tree) only contains two non-leaf node records, each with a copy of a key value (bluish color) and a pointer to the corresponding node (gray color). This tree happens to only have two levels, therefore the "records" in root node point to leaf nodes. One can imagine that there are additional levels (above the topmost tree shown below, call it the "3-5 node"); if that were the case the node above would contain (along with other similar records), a record with key value 3 with a pointer to the "3-5" node.
Also note that only the key values 3 and 5 are contained in non-leaf nodes (i.e. not even all key values are reproduced in the non leaf-nodes).
BTW in this example the non-leaf nodes contain the key of the last record in the next node (would also work if the first record were used instead, slight difference in the way the search logic is then implemented).
The leaf nodes contain the key value (in bluish color too) and the corresponding data record (d1, d2... shown in grey). The red-ish pointer shown at the end of each leaf node point to the next leaf node, i.e. the one containing the very next data record in key order; these pointers are useful to "scan" a range of data records.
All data is in the leaves.
Wiki on B+.
There is some confusion on BTrees and B+Trees. B+Trees only store data on the leaf nodes as pointers. This means the data must be stored elsewhere. BTrees may store data on every node. There are advantages and disadvantages to each. I've noticed that some sites show BTrees exactly the same as B+Trees. In general, BTrees are better at holding the actual data, and B+Trees are much more efficient as indexes.
精彩评论