开发者

Should I use LingPipe or NLTK for extracting names and places?

I'm looking to extract names and places from very short bursts of text example

 "cardinals vs jays in toronto"
 " Daniel Nestor and Nenad Zimonjic play Jonas Bjorkman w/ Kevin Ullyett, paris time to be announced"
"jenson button - pole position, brawn-mercedes - monaco".

This data is currently in a MySQL database, and I (pretty much) have a separate record for each athlete, though names are sometimes spelled wrong, etc.

I would like to extract the athletes and loc开发者_如何学Cations. I usually work in PHP, but haven't been able to find a library for entity extraction (and I may want to get deeper into some NLP and ML in the future).

From what I've found, LingPipe and NLTK seem to be the most recommended, but I can't figure out if either will really suit my purpose, or if something else would be better.

I haven't programmed in either Java or Python, so before I start learning new languages, I'm hoping to get some advice on what route I should follow, or other recommendations.


What you're describing is named entity recognition. So I'd recommend checking out the other questions regarding this topic if you haven't already seen them. This looks like the most useful answer to me.

I can't really comment about whether NLTK or LingPipe is best suited for this task although from looking at the answers it looks like there's quite a few other resources written in Java.

One advantage of going with NLTK is that Python is very accessible as a language. The other advantage is that the NLTK book (which is available for free) offers an introduction to both Python and NLTK at the same time, which would be useful for you.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜