Extract different POS words for a given word in python nltk
Is t开发者_如何学编程here any package in python nltk that can produce all different parts of speech words for a given word. For example if i give add(verb) then it must produce addition(noun),additive(adj) and so on. Can anyone let me know?
There are two options i can think of off the top of my head:
Option one is to iterate over the sample POS-tagged corpora and simply build this mapping yourself. This gives you the POS tags that are associated with a particular word in the corpora.
Option two is to build a hidden markov model POS tagger on the corpora, then inspect the values of the model. This gives you the POS tags that are associated with a particular word in the corpora plus their a priori probabilities, as well as some other statistical data.
Depending on what your use-case is, one may be better than the other. I would start with option one, since it's fast and easy.
NLTK has a lot of clever things hiding away, so there might be a direct way of doing it. However, I think you may have to write your own code to work with the WordNet database.
This might be what you are looking for:
from nltk.corpus import wordnet
add = wordnet.synsets('add', 'v')
add
>>>
[Synset('add.v.01'),
Synset('add.v.02'),
Synset('lend.v.01'),
Synset('add.v.04'),
Synset('total.v.02'),
Synset('add.v.06')]
lemma = add[0].lemmas[0]
lemma
>>> Lemma('add.v.01.add')
lemma.derivationally_related_forms()
>>> [Lemma('addition.n.02.addition'), Lemma('linear.a.01.additive')]
精彩评论