What is the best autocomplete/suggest algorithm,datastructure [C++/C]

2022-12-12 01:36 问答作者：

We see Google, Fire开发者_JS百科fox some AJAX pages show up a list of probable items while user types characters.

Can someone give a good algorithm, data structure for implementing autocomplete?

A trie is a data structure that can be used to quickly find words that match a prefix.

Edit: Here's an example showing how to use one to implement autocomplete http://rmandvikar.blogspot.com/2008/10/trie-examples.html

Here's a comparison of 3 different auto-complete implementations (though it's in Java not C++).

* In-Memory Trie
* In-Memory Relational Database
* Java Set

When looking up keys, the trie is marginally faster than the Set implementation. Both the trie and the set are a good bit faster than the relational database solution.

The setup cost of the Set is lower than the Trie or DB solution. You'd have to decide whether you'd be constructing new "wordsets" frequently or whether lookup speed is the higher priority.

These results are in Java, your mileage may vary with a C++ solution.

For large datasets, a good candidate for the backend would be Ternary search trees. They combine the best of two worlds: the low space overhead of binary search trees and the character-based time efficiency of digital search tries.

See in Dr. Dobbs Journal: http://www.ddj.com/windows/184410528

The goal is the fast retrieval of a finite resultset as the user types in. Lets first consider that to search "computer science" you can start typing from "computer" or "science" but not "omputer". So, given a phrase, generate the sub-phrases starting with a word. Now for each of the phrases, feed them into the TST (ternary search tree). Each node in the TST will represent a prefix of a phrase that has been typed so far. We will store the best 10 (say) results for that prefix in that node. If there are many more candidates than the finite amount of results (10 here) for a node, there should be a ranking function to resolve competition between two results.

The tree can be built once every few hours, depending on the dynamism of the data. If the data is in real time, then I guess some other algorithm will give a better balance. In this case, the absolute requirement is the lightning-fast retrieval of results for every keystroke typed which it does very well.

More complications will arise if the suggestion of spelling corrections is involved. In that case, the edit distance algorithms will have to be considered as well.

For small datasets like a list of countries, a simple implementation of Trie will do. If you are going to implement such an autocomplete drop-down in a web application, the autocomplete widget of YUI3 will do everything for you after you provide the data in a list. If you use YUI3 as just the frontend for an autocomplete backed by large data, make the TST based web services in C++, and then use script node data source of the autocomplete widget to fetch data from the web service instead of a simple list.

Segment trees can be used for efficiently implementing auto complete

If you want to suggest the most popular completions, a "Suggest Tree" may be a good choice: Suggest Tree

For a simple solution : you generate a 'candidate' with a minimum edit (Levenshtein) distance (1 or 2) then you test the existence of the candidate with a hash container (set will suffice for a simple soltion, then use unordered_set from the tr1 or boost).

Example: You wrote carr and you want car. arr is generated by 1 deletion. Is arr in your unordered_set ? No. crr is generated by 1 deletion. Is crr in your unordered_set ? No. car is generated by 1 deletion. Is car in your unordered_set ? Yes, you win.

Of course there's insertion, deletion, transposition etc...

You see that your algorithm for generating candidates is really where you’re wasting time, especially if you have a very little unordered_set.

继续阅读：algorithm autocomplete c search

What is the best autocomplete/suggest algorithm,datastructure [C++/C]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？