Wordnet Synset Offset? How to compare words
I am using the Chinese Wordnet from Academic Sinica. It is a translation of Wordnet 1.6. Unfortunately it is not freely available, and has to be purchased, and the manual basically says refer to Wordnet's manual. What I am trying to figure out is how to compare the similarity between two words. I imagine it is done with the WordNetSynsetOffset but I could not find anything on the Wordnet website or documentation on how to use this t开发者_Python百科o compare two words. As for the actual algorithms I suppose this is a good start http://marimba.d.umn.edu/similarity/measures.html
<Record Conut="65">
<EnglishLemma>exercise</EnglishLemma>
<POS>Noun</POS>
<WordNetSynsetOffset Version="1.6">00469856</WordNetSynsetOffset>
<EnglishFrequancyRank>通用詞彙</EnglishFrequancyRank>
<ChineseTransList>
<ChineseTrans>
<ChineseLemma>例題</ChineseLemma>
<ChineseFrequancyRank>通用詞彙</ChineseFrequancyRank>
</ChineseTrans>
</ChineseTransList>
</Record>
So I think what you are looking for (based on the comments), is the WordNet API.
If the Chinese format is the same, you might be able to use the WordNet API that shipped with your installation. It's a C library, you can find the documentation here:
http://wordnet.princeton.edu/wordnet/documentation/
Basically - here's how it works. A Synset is a group of synonymous terms for the synset identified, which is uniquely identified by the Synset Id (the 00469856). Synsets are connected to other synsets through various forms of semantic relations. Most of the similarity metrics work by searching for one Synset (by the number you referenced below, the API should support this), and then seeing how far away another Synset is by using various metrics.
A synset also contains a textual description of the semantic meaning of the synset - the standard dictionary definition we are used to. In some cases, some similarity metrics (such as the Lesk algorithm), uses the textual description to compare how "similar" two synsets are to each other.
There are other API's available that allow you to search and access WordNet through it's API in various languages.
http://wordnet.princeton.edu/wordnet/related-projects/
For instance, here is an example Synset definition from the WordNet 3.0 dictionary files:
00020671 29 v 04 hypnotize 0 hypnotise 0 mesmerize 0 mesmerise 0 (... more left out)...
The unique identifier 00020671 identifies this synset. There are four synonyms here for hypnotize.
A word could have many possible senses (synsets). If you want to compare similarity between two senses, you'll first have to disambiguate each word. Once you know which two senses you're comparing, you can use what @bwalenz has suggested.
精彩评论