开发者

string handling

I would like to know that if I have an english dictionary in a text file what is the best way to check whether a given string is a proper and correct english word. My dictionary contains about 100000 english words and I have to check on an average of 60000 words in one go. I am just looking for the most efficient way. Also should I store all the strings first or I just process them as they are 开发者_运维知识库generated.

Thanx


100k is not too great a number, so you can just pop everything in a Hashset<string>.

Hashset lookup is key-based, so it will be lightning fast.

example how this might look in code is:

string[] lines = File.ReadAllLines(@"C:\MyDictionary.txt");
HashSet<string> myDictionary = new HashSet<string>();
foreach (string line in lines)
{
  myDictionary.Add(line);
}

string word = "aadvark";
if (myDictionary.Contains(word))
{
  Console.WriteLine("There is an aadvark");
}
else
{
  Console.WriteLine("The aadvark is a lie");
}


You should probably use HashSet<string> if you're using .NET 3.5 or higher.

Just load the dictionary of valid words into a HashSet<string> and then either use Contains on each candidate string, or use some of the set operators to find all words which aren't valid.

For example:

// There are loads of ways of loading words from a file, of course
var valid = new HashSet<string>(File.ReadAllLines("dictionary.txt"));
var candidates = new HashSet<string>(File.ReadAllLines("candidate.txt"));

var validCandidates = candidates.Intersect(valid);
var invalidCandidates = candidates.Except(valid);

You may also wish to use case-insensitive comparisons or something similar - use the StringComparer static properties to get at appropriate instances of StringComparer which you can pass to the HashSet constructor.

If you're using .NET 2, you can use a Dictionary<string, whatever> as a poor-man's set - basically use whatever you like as the value, and just check for keys.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜