How to get synonyms ordered by their occurrence probability from Wordnet
I am searching in Wordnet for synonyms for a big lis开发者_Python百科t of words. The way I have it done it, when some word has more than one synonym, the results are returned in alphabetical order. What I need is to have them ordered by their probability of occurrence, and I would take just the top 1 synonym.
I have used the prolog wordnet database and Syns2Index to convert it into Lucene type index for querying synonyms. Is there a way to get them ordered by their probabilities in this way, or I should use another approach?
Speed not important, this synonym lookup will not be done online.
In case someone stumbles upon this thread, this was the way to go(at least what i needed):
http://lyle.smu.edu/~tspell/jaws/doc/edu/smu/tspell/wordnet/impl/file/ReferenceSynset.html#getTagCount%28java.lang.String%29
tagCount method gives the most likely synset group for every word. The problem again is that synset with highes probability again can have several words. But i guess theres no chance to avoid this
I think that you should do another step (provided that speed is not important).
From the Lucene index, you should build another dictionary in which each word is mapped to a small object that contains the only synonym that its meaning has higher probability of appearance, its meaning, and probability of appearance. I.e., given this code:
class Synonym {
public:
String name;
double probability;
String meaning;
}
Map<String, Synonym> m = new HashMap<String, Synonym>();
... you just have to fill it from the Lucene index.
精彩评论