开发者

parsing data and POS with treetop vs. stanford nlp

I'm trying to parse event (concerts, movies, etc. etc.) data in Ruby and can't decide on what tool to use.

I thought the stanford parser was the way to go initially, but then heard of treetop.

I'm struggling with both, as getting the stanford parser to work with Ruby on Windows has taken up two+ days of searching and struggling and no end of errors in just getting it installed.

Treetop installed no problem, but the documentation is very limited, and from what I can gather, it seems that treetop is best at dealing with a grammar structure than the actual content, but maybe I'm just not completely understanding Treetop capabilities.

One of the nice things (I think) is that I have is a large database/corpus(?) of band and movie names, and a fairly limited parts of data that I'm looking to retrieve.

For instance one listing is

The Tragically Hip with Guest Hey Rosetta!, Friday Jul 15th, 7:30pm, Deer Lake Park

Another listing is

07/08/11 - Tacoma Dome,  New Kids on the Block & Backstreet Boys w/ Matthew Morrison, 7:30pm, Tacoma, WA

With each listing I'm trying to grab a rather specific group of details, being who/what, date, time, city, venue.

Seeing as I already have a dataset of band names, and city names should be fairly easy to get a listing of, it should be 'fairly' easy to pick out the other details, I'm just not sure which tool 开发者_JAVA百科I should dedicate my time to, or if there is a better way to do this?

Any suggestions?


No, treetop is used to parse more structured languages (like computer languages). For Natural Language Parsing (NLP), you'd better use The Stanford Parser or something like it. Have a look at this blog entry about NLP in combination with Ruby:

http://mendicantbug.com/2009/09/13/nlp-resources-for-ruby/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜