Can extract generic entities using Lingpipe other than People, Org and Loc?

2023-04-07 10:09 问答作者：

I have read through Lingpipe for N开发者_开发问答LP and found that we have a capability there to identify mentions of names of people, locations and organizations. My questions is that if I have a training set of documents that have mentions of let's say software projects inside the text, can I use this training set to train a named entity recognizer? Once the training is complete, I should be able to feed a test set of textual documents to the trained model and I should be able to identify mentions of software projects there.

Is this generic NER possible using NER? If so, what features should I be using that I should feed?

Thanks Abhishek S

Provided that you have enough training data with tagged software projects that would be possible.

If using Lingpipe, I would use character n-grams model as the first option for your task. They are simple and usually do the work. If results are not good enough some of the standard NER features are:

tokens
part of speech (POS)
capitalization
punctuaction
character signatures: these are some ideas: ( LUCENE -> AAAAAA -> A) , (Lucene -> Aaaaaa -> Aa ), (Lucene-core --> Aaaaa-aaaa --> Aa-a)
it may also be useful to compose a gazzeteer (list of software projects) if you can obtain that from Wikipedia, sourceforge or any other internal resource.

Finally, for each token you could add contextual features, tokens before the current one (t-1, t-2...), tokens after the current one (t+1,t+2...) as well as their bigram combinations (t-2^t-1), (t+1^t+2).

Of course you can. Just get train data with all categories you need and follow tutorial http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html. No feature tuning is required since lingpipe uses only hardcoded one (shapes, sequnce word and ngramms)

继续阅读：machine-learning named-entity-extraction text-analysis

Can extract generic entities using Lingpipe other than People, Org and Loc?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？