开发者

is there a dictionary i can download for java?

is there a dictionary i can download for java? i want to have a program that takes a few random letters and sees if they can be rearanged into a real word开发者_如何学编程 by checking them against the dictionary


Is there a dictionary i can download for java?

Others have already answered this... Maybe you weren't simply talking about a dictionary file but about a spellchecker?

I want to have a program that takes a few random letters and sees if they can be rearranged into a real word by checking them against the dictionary

That is different. How fast do you want this to be? How many words in the dictionary and how many words, up to which length, do you want to check?

In case you want a spellchecker (which is not entirely clear from your question), Jazzy is a spellchecker for Java that has links to a lot of dictionaries. It's not bad but the various implementation are horribly inefficient (it's ok for small dictionaries, but it's an amazing waste when you have several hundred thousands of words).

Now if you just want to solve the specific problem you describe, you can:

  • parse the dictionary file and create a map : (letters in sorted order, set of matching words)

  • then for any number of random letters: sort them, see if you have an entry in the map (if you do the entry's value contains all the words that you can do with these letters).

    abracadabra : (aaaaabbcdrr, (abracadabra))

    carthorse : (acehorrst, (carthorse) )

    orchestra : (acehorrst, (carthorse,orchestra) )

etc...

Now you take, say, three random letters and get "hsotrerca", you sort them to get "acehorrst" and using that as a key you get all the (valid) anagrams...

This works because what you described is a special (easy) case: all you need is sort your letters and then use an O(1) map lookup.

To come with more complicated spell checkings, where there may be errors, then you need something to come up with "candidates" (words that may be correct but mispelled) [like, say, using the soundex, metaphone or double metaphone algos] and then use things like the Levenhstein Edit-distance algorithm to check candidates versus known good words (or the much more complicated tree made of Levenhstein Edit-distance that Google use for its "find as you type"):

http://en.wikipedia.org/wiki/Levenshtein_distance

As a funny sidenote, optimized dictionary representation can store hundreds and even millions of words in less than 10 bit per word (yup, you've read correctly: less than 10 bits per word) and yet allow very fast lookup.


Dictionaries are usually programming language agnostic. If you try to google it without using the keyword "java", you may get better results. E.g. free dictionary download gives under each dicts.info.


OpenOffice dictionaries are easy to parse line-by-line.

You can read it in memory (remember it's a lot of memory):

List words = IOUtils.readLines(new FileInputStream("dicfile.txt")) (from commons-io)

Thus you get a List of all words. Alternatively you can use the Line Iterator, if you encounter memory prpoblems.


If you are on a unix like OS look in /usr/share/dict.


Here's one:

http://java.sun.com/docs/books/tutorial/collections/interfaces/examples/dictionary.txt

You can use the standard Java file handling to read the word on each line:

http://www.java-tips.org/java-se-tips/java.io/how-to-read-file-in-java.html


Check out - http://sourceforge.net/projects/test-dictionary/, it might give you some clue

I am not sure if there are any such libraries available for download! But I guess you can definitely digg through sourceforge.net to see if there are any or how people have used dictionaries - http://sourceforge.net/search/?type_of_search=soft&words=java+dictionary

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜