Persisting a trie to a file - C
I have a trie
which I am using to do some string processing. I have a simple compiler which generates trie
from some data. Once generated, my trie
won't change at run time.
I am looking f开发者_Go百科or an approach where I can persist the trie in a file and load it effectively. I have looked at sqllite
to understand how they are persisting b-tree
but their file format looks bit advanced and I may not need all of those.
It'd be helpful if someone can provide some ideas to persist and read the trie
. I am programming using C.
I did some research and found the following little gems online:
trie.h
trie.c
A working trie with serialization and deserialization. It was originally written for use in Python (there's a corresponding triemodule.c
for tying it to Python), but it's pure C; you could mine it for ideas or use it as you wish.
Update:
It appears the links are no longer working. I'll keep the originals up, but here are the links in the wayback machine:
trie.h
trie.c
Assuming your entire data structure fits in memory, a recursive serialization approach is simplest. Sqllite works with data structures that won't fit in memory, so it is probably overkill to try copying their methods.
Here is example pseudocode for reading/writing a node. It works by recursively reading/writing the child nodes. It has nothing trie-specific, and should work for other tree data structures as well.
void writeNode(Node *node)
write node data to file
write node.numOfChildren to file
for each child:
writeNode(child)
Node *readNode()
Node *node = allocateNewNode()
read node data from file
read node.numOfChildren from file
for (i=0; i<node.numOfChildren; i++)
Node *child = readNode()
node.addChild(child)
If all of your nodes are the same size then you can just enumerate your nodes (root = 0)
and write each of them to a file at their index. While writing them you will have to change their references to other nodes to those nodes' indexes, though. You will probably also need a NULL
value. You could use -1
or you could use (root = 1)
and (NULL = 0).
You will probably also be able to compress these nodes somewhat by changing their pointer fields to be smaller types.
If your nodes are different sizes then it's more complicated.
Once you generated a trie in memory, it's not complex to write it into a file, with mmap, it's like that we re-generate the trie in a continous memory block, or you can write each node into a file in a recursive function.
精彩评论