Check if a string containing 2 words is in a dictionary
I have an array of dictionary words and a random string like "twowords"
What is the fastest way to check if the entire string is m开发者_开发问答ade up of dictionary words? so "twojwords" would return false and "twowords" returns true
I was using a binary search before but it couldn't handle two word strings
(I'm using objective c)
One alternative to testing every substring is to build a trie from your dictionary and traverse it, one letter at a time, a la a BFS. Instead of just keeping track of the current node, we would keep track of the node as well as how many words we had seen. At each node, one would enqueue the next letter if available and also the start node (plus 1 word) if the current node is a terminal node. If we reach the end of the string and are at the start node with 2 words, then there is a solution.
This neatly generalizes to more than two words. I'm not sure it is asymptotically faster in the worse case than string spitting, but it will likely be faster because a) there is no word generation b) we know the nth string is in the dictionary before testing the n+1th
Here's a naive approach, but I don't know if there is a faster way.
for i1 from 0 to 6 do
substring1 = string[0..i1];
if (inDictionary(substring1)) {
for i2 from i1+1 to 6 do
substring2 = string[i1+1..i2];
if (inDictionary(substring2)) {
for i3 from i2+1 to 6 do
... (up to i6)
For this, you need to be able to form a substring of letters a to b of the string. E.g. if string = "thisisastring"
then string[4..7] = "isas"
. You also need a boolean function inDictionary, which should do a binary search for the substring in the dictionary.
This method will work, but depending on the size of the dictionary, may take a bit of time. I believe the English language is currently around 200000 words, in which case any reasonable programming language should have reasonable performance.
First, it would be to use some kind of hash table data structure for your dictionary, giving you ~O(1) for a one word lookup. That's a lot better than binary search, especially when your dictionary is of size 200,000
For two words, split your string into halfs, like t-wowords,tw-owords,two-words,twow-ords etc. and make a lookup for each half.
If you want to allow an arbitrary number of parts, you should try a recursive approach. In pseudocode:
bool checkWord(string word)
{
if(length(word)==1)
return isWordInDict(word);
for each pair w1, w2 of word halfs
{
if(isWordInDict(w1) && checkWord(w2))
return true;
}
return false;
}
精彩评论