C/C++: How to store data in a file in B tree
It appears to me that one way of storing data in a B-tree as a file can be done efficiently with C using binary file with a sequence (array) of structs, with each struct representing a node. One can thus connect the individual nodes with approach that will be s开发者_如何学Goimilar to creating linked lists using arrays. But then the problem that props up would be deletion of a node, as erasing only a few bytes in the middle in a huge file is not possible.
One way of deleting could be to keep track of 'empty' nodes until a threshold cutoff is reached and then make another file that will discard the empty nodes. But this is tedious.
Is there a better approach from the simplicity/efficiency point of view for deleting, or even representing a B-tree in a file?
TIA, -Sviiya
For implementing B-Trees in a file, you can use the file offset instead of pointers. Also, you can implement a "file memory manager", so that you can re-use deleted items in the file.
In order to fully recover the deleted blocks in a B-Tree file, you will have to recreate the B-Tree in a new file. Also remember the most OSes have no methods for truncating files. A portable method for truncating a file is to write a new file and destroy the old.
Another suggestion is to partition the file into B-Tree partition and data (item) partition. A B-Tree partition will contain the pages. The leaf pages will contain offsets to the data items. The data partition will be a section in the file containing data items. You may end up creating more than one of each partition and the partitions may be interleaved.
I spent much time playing with a file based B-Tree, until I gave up and decided to let a database program (or server) handle the data for me.
I did a very quick search and dug up this: http://people.csail.mit.edu/jaffer/WB C source: http://cvs.savannah.gnu.org/viewvc/wb/wb/c/ - it seems to offer disk-based B-tree style databases - although taking a look at "delete.c" it seemed to imply if you delete a node everything down from it would be taken out - if that's the correct behaviour then it sounds like something that might help?
Also - B-trees are often used in filesystems - could you not take a look at some filesystem code?
My own inclination is that of a file-system - if you have a B-tree of fixed-size, whenever you "delete" a node rather than attempting to remove the reference, just set the value to whatever means nothing in your code. Then, have a clean-up thread running that checks if anyone has the file open for reading and if all's quiet blocks the file and tidies up.
You can use Berkley DB as well. It works well with C programs and implements B+ tree.
精彩评论