Best way to support wildcard search on a large dictionary?
I am working on a project to search in a large dictionary (100k~1m words). The dictionary items look like {key,value,freq}. Myy task is the development of an i开发者_JAVA技巧ncremental search algoritm to support exact match, prefix match and wildcard match. The results should be ordered by freq.
For example: the dictionary looks like
key1=a,value1=v1,freq1=4
key2=ab,value2=v2,freq2=2
key3=abc,value3=v3 freq3=1
key4=abcd,value4=v4,freq4=3
when a user types 'a', return v1,v4,v2,v3
when a user types 'a?c', return v4,v3Now my best choice is a suffix tree represented by DAWG data struct, but this method does not support wildcard matches effectively.
Any suggestion?
You need to look at n-grams for indexing your content. If you want to something Out-of-the box, you might want to look at Apache Solr which does a lot of the hard work for you. It also supports prefix, wildcard queries etc.
精彩评论