开发者

Space efficient trie

I'm trying to implement a space efficient trie in C. This is my struct:

struct node {
char val; //character stored in node
int key; //key value if this character is an end of word
struct node* children[256];
};

When I add a node, it's index is the unsigned char cast of the chara开发者_开发问答cter. For example, if I want to add "c", then

children[(unsigned char)'c']

is the pointer to the newly added node. However, this implementation requires me to declare a node* array of 256 elements. What I want to do is:

struct node** children;

and then when adding a node, just malloc space for the node and have

children[(unsigned char)'c']

point to the new node. The issue is that if I don't malloc space for children first, then I obviously can't reference any index or else that's a big error.

So my question is: how do I implement a trie such that it only stores the non-null pointers to its children?


You could try using a de la Briandais trie, where you only have one child pointer for each node, and every node also has a pointer to a "sibling", so that all siblings are effectively stored as a linked list rather than directly pointed to by the parent.


You can't really have it both ways and be both space efficient and have O(1) lookup in the children nodes.

When you only allocate space for the entries that's actually added, and not the null pointers, you can no longer do

children[(unsigned char)'c']

As you can no longer index directly into the array.

One alternative is to simply do a linear search through the children. and store an additional count of how many entries the children array has i.e.

children[(unsigned char)'c'] = ...;

Have to become

for(i = 0; i < len; i++) {
  if(children[i] == 'c')
     break;
} 
if(i == len) {
  //...reallocate and add space for one item in children
}
children[i] = ...;

If your tree ends up with a lot of non-empty entries at one level, you might insert the children in sorted order and do a binary search. Or you might add the childrens as a linked list instead of an array.


If you just want to do an English keyword search, I think you can minimize the size of your children, from 256 to just 26 - just enough to cover the 26 letters a-z.

Furthermore, you can use a linked list to keep the number of children even smaller so we can have more efficient iteration.

I haven't gone through the libraries yet but I think trie implementation will help.


You can be both space efficient and keep the constant lookup time by making child nodes of every node a hash table of nodes. Especially when Unicode characters are involved and the set of characters you can have in your dictionary is not limited to 52 + some, this becomes more of a requirement than a nicety. This way you can keep the advantages of using a trie and be time and space efficient at the same time.

I must also add that if the character set you are using is approaching unbounded, chances are having a linked list of nodes may just do fine. If you like an unmanageable nightmare, you can opt for a hybrid approach where first few levels keep their children in hash tables while the lower levels have a linked list of them. For a true bug farm, opt for a dynamic one where as each linked list passes a threshold, you convert it to a hash table on the fly. You could easily amortize the cost.

Possibilities are endless!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜