word disambiguation algorithm (Lesk algorithm)
Hii.. Can anybody help me to find an algorithm in Java code to find synonyms of a search word bas开发者_如何学JAVAed on the context and I want to implement the algorithm with WordNet database.
For example, "I am running a Java program". From the context, I want to find the synonyms for the word "running", but the synonyms must be suitable according to a context.
Let me illustrate a possible approach:
- Let your sentence be
A B C
- Let each word have synsets i.e.
{A:(a1, a2, a3), B:(b1), C:(c1, c2)}
- Now form possible synset sets:
(a1, b1, c1), (a1, b1, c2), (a2, b1, c1) ... (a3, b1, c2)
- Define function
F(a, b, c)
which returns the distance (score) between (a, b, c). - Call F on each synset set.
- Pick the set with the maximum score.
For starters, the function F can just return the product of the inverse of the number of nodes between the two nodes:
Maximize(Product[i=0 to len(sentence); j=0 to len(sentence)] (1/D(node_i, node_j)))
Later on, you can increase its complexity.
This is the perfect document for your problem. The acc of the algorithm is not high but I think it will be enough .
On this link you can find a Java API for WordNet Searching (JAWS).
Hi i got to have a look at this page when i was searching for lesk algorithm implementations . I think it comes as a part of the JAWS package . i havent used it yet , but i guess this will help
精彩评论