Asymptotically Fast Associative Array with Low Memory Requirements

2023-01-08 17:52 问答作者：

Ok, tries have been around for a while. A typical implementation should give you O(m) lookup, insert and delete operations independently of the size n of the data set, where m is the message length. However, this same implementation takes up 256 words per input byte, in the worst case.

Other data structures, notably hashing, give you expected O(m) lookup, insertion and deletion, with some implementations even providing constant time lookup. Nevertheless, in the worst case the routines either do not halt or ta开发者_C百科ke O(nm) time.

The question is, is there a data structure that provides O(m) lookup, insertion and deletion time while keeping a memory footprint comparable to hashing or search trees?

It might be appropriate to say I am only interested in worst case behaviour, both in time and space-wise.

Did you try Patricia-(alias critbit- or Radix-) tries? I think they solve the worst-case space issue.

There is a structure known as a suffix array. I can't remember the research in this area, but I think they've gotten pretty darn close to O(m) lookup time with this structure, and it is much more compact that your typical tree-based indexing methods.

Dan Gusfield's book is the Bible of string algorithms.

I don't think there a reason to be worried about the worst case for two reasons:

You'll never have more total active branches in the sum of all trie nodes than the total size of the stored data.
The only time the node size becomes an issue is if there is huge fan-out in the data you're sorting/storing. Mnemonics would be an example of that. If you're relying on the trie as a compression mechanism, then a hash table would do no better for you.

If you need to compress and you have few/no common subsequences, then you need to design a compression algorithm based on the specific shape of the data rather than based on generic assumptions about strings. For example, in the case of a fully/highly populated mnemonic data set, a data structure that tracked the "holes" in the data rather than the populated data might be more efficient.

That said, it might pay for you to avoid a fixed trie node size if you have moderate fan-out. You could make each node of the trie a hash table. Start with a small size and increase as elements are inserted. Worst-case insertion would be c * m when every hash table had to be reorganized due to increases where c is the number of possible characters / unique atomic elements.

In my experience there are three implementation that I think could met your requirement:

HAT-Trie (combination between trie and hashtable)
JudyArray (compressed n-ary tree)
Double Array Tree

You can see the benchmark here. They are as fast as hashtable, but with lower memory requirement and better worst-case.

继续阅读：algorithm c data-structures trie

Asymptotically Fast Associative Array with Low Memory Requirements

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？