parsing data and POS with treetop vs. stanford nlp
I'm trying to parse event (concerts, movies, etc. etc.) data in Ruby and can't decide on what tool to use.
I thought the stanford parser was the way to go initially, but then heard of treetop.
I'm struggling with both, as getting the stanford parser to work with Ruby on Windows has taken up two+ days of searching and struggling and no end of errors in just getting it installed.
Treetop installed no problem, but the documentation is very limited, and from what I can gather, it seems that treetop is best at dealing with a grammar structure than the actual content, but maybe I'm just not completely understanding Treetop capabilities.
One of the nice things (I think) is that I have is a large database/corpus(?) of band and movie names, and a fairly limited parts of data that I'm looking to retrieve.
For instance one listing is
The Tragically Hip with Guest Hey Rosetta!, Friday Jul 15th, 7:30pm, Deer Lake Park
Another listing is
07/08/11 - Tacoma Dome, New Kids on the Block & Backstreet Boys w/ Matthew Morrison, 7:30pm, Tacoma, WA
With each listing I'm trying to grab a rather specific group of details, being who/what, date, time, city, venue.
Seeing as I already have a dataset of band names, and city names should be fairly easy to get a listing of, it should be 'fairly' easy to pick out the other details, I'm just not sure which tool 开发者_JAVA百科I should dedicate my time to, or if there is a better way to do this?
Any suggestions?
No, treetop
is used to parse more structured languages (like computer languages). For Natural Language Parsing (NLP), you'd better use The Stanford Parser or something like it. Have a look at this blog entry about NLP in combination with Ruby:
http://mendicantbug.com/2009/09/13/nlp-resources-for-ruby/
精彩评论