开发者

Named Entity Recognition from personal Gazetter using Python

I try to do named entity recognition in python using NLTK. I want to extract personal list of skills. I have the list of skills and would like to search them in requisition and tag the skills. I noticed that NLTK has NER tag for predefine tags like Person, Location etc. Is there a external gazetter tagger in Python I can use? any idea how to do it more sophisticated than search of terms ( sometimes multi words term )?

Thanks, As开发者_运维百科saf


I haven't used NLTK enough recently, but if you have words that you know are skills, you don't need to do NER- just a text search.

Maybe use Lucene or some other search library to find the text, and then annotate it? That's a lot of work but if you are working with a lot of data that might be ok. Alternatively, you could hack together a regex search which will be slower but probably work ok for smaller amounts of data and will be much easier to implement.


Have a look at RegexpTagger and eventually RegexpParser, I think that's exactly what you are looking for.

You can create your own POS tags, ie. map skills to a tag, and then easily define a grammar.

Some sample code for the tagger is in this pdf.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜