populate my data structure with thousands of real english words
I need to test my data structure (in java) which is like a dictionary - holds a key/value map. I would like to know how do you test your dat开发者_开发知识库a structure? I would like to insert real words in my data structure and then find them. I am wondering if there is a way to download all the english words and then I can read that file and populate my structure. Once populated, I can perform many searches and produce some real statistics of how long does it take to search?
There are indeed several open-source dictionaries for the English language, e.g. the WordNet file.
That said, I must insist that the English language is not a “closed” language, nor does it have one true official definition. As such, there is no dictionary that contains “all the English words” and such a dictionary can never exist: English words are made up all the time, and once enough people use them, the become part of the English language. Case in point: “to google.”
Perhaps Project Gutenberg would be helpful. I've used them on past CS projects. They provide plain text files (e.g. The Valley of Fear), which should be easy to process. You may want to skip over the headers to avoid skewing the results.
This will let you test your dictionary by keeping e.g. a word->count mapping (e.g. Map<String, Integer>
) of the words in the file.
If you're on Linux, you could use the contents of /usr/share/dict/words
; there's also WordNet, an English word database.
If you have a key-value pair you probably don't want a simple list of words, you want words to definitions or to words in other languages.
If you don't mind parsing a text file, IDP has a bunch of files for download royalty free.
精彩评论