Are there some better ways to implement find as you type in Java with a fairly small data set?

2022-12-22 21:44 问答作者：

I've got about 2500 short phrases in a file. I want to be able to find phrases as I type possible substrings of them. My app has a text box and a list of phrases. The text box is initially empty and the list contains all 2500 phrases, since the empty string is a substring of all of them. As I type in the text box, the list updates so that it always only contains phrases which contain the text box's value as a substring.

At the moment I have one of Google's Multimaps, specifically:

LinkedHashMultimap<String, String>开发者_运维知识库;

with every single possible substring mapped to its possible matches. This takes a while to load (about a second) and I think it must be taking up quite a bit of space (which may be a concern in the future.) It's very fast with the lookups though.

Is there a way I could do this with some other data structure or strategy that would be quicker to load and take less space (possibly at the expense of the speed of the lookups)?

If your list only contains 2500 elements, a simple loop and checking contains() on all of them should be fast enough.

If it grows bigger and/or is too slow, you can apply some easy optimizations:

Don't search immediately as the user types each character, but introduce some delay. So if he types "foobar" really fast, you only search for "foobar", not first "f" then "fo" then "foo",...
Reuse your previous results: if the user first types "foo" and then extends that to "foobar", don't search in the whole original list again, but search inside the results for "foo" (because everything that contains "foobar" must contain "foo").

In my experience, these these basic optimizations already get you quite far.

Now, if the list grows so big that even that is too slow, some "smarter" optimizations as proposed in other answers here (tries, suffix trees,...) would be needed.

You'll want to look into using the Trie data structure.

Try simply looping over the entire list and calling contains() - doing that 2500 times is probably completely unnoticeable.

You definetely need a Suffix Tree.. (wiki)

(i think this implementation could be ok: link)

EDIT:

I've read your comment, you shouldn't blindly check if the string is a substring somewhere in you phrase, you usually start with a word, not with a space. So maybe it's better to tokenize words inside your phrase?

Are you allowed to do it? Otherwise the best way is to build an automata for every phrase or using similar algorithms (for example the Karp-Rabin string search algorithm).

Wouter Coekaerts has a good approach, but I would go a bit further.

Don't bring up anything when the textbox contains a single character. The results won't be useful. You may find that this is true for two characters as well.
Precompute the results for two characters. When there are two characters bring up the precomputed list.
When a third character is added do the 'contains' search on the list you have currently displayed (anything that doesn't contain c1c2 can't contain c1c2c3). By now the list should be small enough that 'contains' has perfectly adequate performance.
Similarly for four characters etc.
As said above, put in a little delay before starting the search. Or better still arrange for a search to be killed if another character is typed before it finishes.

继续阅读：algorithm data-structures

Are there some better ways to implement find as you type in Java with a fairly small data set?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？