开发者

Algorithm for matching partially filled words

I am writing a game which when given a partially filled word, searches a dictionary and returns all the matching words. To that effect, I am trying to find an algorithm that can 开发者_如何学Cbe used for the said purpose. For example, given - - a -, the algorithm will search a dictionary for all the words which have length 4 and have 'a' as the third letter.

Is there such an algorithm already? If not, can somebody given a rough idea of how to design such an algorithm?

Thanks in Advance.


Well, it does not already exists, but it's been researched on SO already, for crosswords problems.

The gist of the solution I proposed was to index by letter and indexes, which is Python gives:

class Index:
  def __init__(self, list):
    self.data = defaultdict(set)
    for word in list: self.add(word)

  def add(self, word):
    for l in range(0, len(word)):
      self.data[(l, word[l])].insert(word)

  def look(self, letters):
    """letters is a list of tuples (position, letter)"""

    result = None
    for (p,l) in letters:
      set = self.data[(p,l)]
      if result == None: result = set
      else: result = result.insersection(set)

    return result

The idea is simple: you have a large index which has a set of words for each couple (position,letter). It could be extended, in your case, to have one index per word length, which would dimish the size of the sets of words and thus be faster.

For retrieval, you simply intersect all the sets to have the common set of word that matches all the known letters.


another solution could be to structure your dictionary as a prefix tree. Then your algorithm will just have to go trough that tree. For each node you know which letter is associated and the position in the word so you know if it matches the letter you are looking for. If it doesn't you stop and don't go through its children. You also know when you go over the length of your query. Each leaf you reach can be added to the results list.

This solution may be quite efficient as far as the memory consumption is concerned.


test = '--a-';

for each (words as word)
{
    if ((word.length == test.length)
        && (test.index(0) == '-' || (word.index(0) == test.index(0)))
        && (test.index(1) == '-' || (word.index(1) == test.index(1)))
        && (test.index(2) == '-' || (word.index(2) == test.index(2)))
        && (test.index(3) == '-' || (word.index(3) == test.index(3))))
    {
        // match
    }
}

Is that what you need? Obviously it needs modified a little to work for different lengths.


From what I understood,can't you use a regex query? in the example above the pattern is something like ??a?

Then you need to loop through all the words and check if there is a match.


If you're running on a reasonably powerful computer (compared to the load), then PierrOz has a good answer: store the dictionary as a prefix tree. Then you can do a breadth-first search, pruning only when you reach a level where you actually know the letter.

If you need an even faster solution, you need a way of constraining your search depth. One possibility is to bin the answers. For example, you can start by grouping the words by length; then you only have to look through lists of words of a certain length. Then you could subgroup by words containing specific letters--all letter pairs is probably enough. This gives you an array of something like 13000 elements that you can rapidly index into: count the number of letters in your word, then pick the rarest letter or two in the word and use that to index into a mini-prefix-tree that only holds words of that length with those letter(s). This strategy should get you down to a couple hundred words per bin in most cases, and the prefix tree search should be fast even if you pick out most of the breadth of the tree.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜