Check if a string containing 2 words is in a dictionary

2023-03-08 20:59 问答作者：

I have an array of dictionary words and a random string like "twowords"

What is the fastest way to check if the entire string is m开发者_开发问答ade up of dictionary words? so "twojwords" would return false and "twowords" returns true

I was using a binary search before but it couldn't handle two word strings

(I'm using objective c)

One alternative to testing every substring is to build a trie from your dictionary and traverse it, one letter at a time, a la a BFS. Instead of just keeping track of the current node, we would keep track of the node as well as how many words we had seen. At each node, one would enqueue the next letter if available and also the start node (plus 1 word) if the current node is a terminal node. If we reach the end of the string and are at the start node with 2 words, then there is a solution.

This neatly generalizes to more than two words. I'm not sure it is asymptotically faster in the worse case than string spitting, but it will likely be faster because a) there is no word generation b) we know the nth string is in the dictionary before testing the n+1th

Here's a naive approach, but I don't know if there is a faster way.

for i1 from 0 to 6 do
  substring1 = string[0..i1];
  if (inDictionary(substring1)) {
    for i2 from i1+1 to 6 do
      substring2 = string[i1+1..i2];
      if (inDictionary(substring2)) {
        for i3 from i2+1 to 6 do 
... (up to i6)

For this, you need to be able to form a substring of letters a to b of the string. E.g. if string = "thisisastring" then string[4..7] = "isas". You also need a boolean function inDictionary, which should do a binary search for the substring in the dictionary.

This method will work, but depending on the size of the dictionary, may take a bit of time. I believe the English language is currently around 200000 words, in which case any reasonable programming language should have reasonable performance.

First, it would be to use some kind of hash table data structure for your dictionary, giving you ~O(1) for a one word lookup. That's a lot better than binary search, especially when your dictionary is of size 200,000

For two words, split your string into halfs, like t-wowords,tw-owords,two-words,twow-ords etc. and make a lookup for each half.

If you want to allow an arbitrary number of parts, you should try a recursive approach. In pseudocode:

bool checkWord(string word)
{
    if(length(word)==1)
        return isWordInDict(word);
    for each pair w1, w2 of word halfs
    {
        if(isWordInDict(w1) && checkWord(w2))
            return true;
    }
    return false;
}

继续阅读：algorithm optimization

Check if a string containing 2 words is in a dictionary

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？