开发者

How do you find the subject of a sentence? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

Questions asking us to recommend or find a tool, library or favorite off-site resource are off-开发者_StackOverflowtopic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.

Closed 8 years ago.

Improve this question

I am new to NLP and was doing research about what language toolkit I should be using to do the following. I would like to do one of the two things which accomplishes the same thing:

  1. I basically would like to classify a text, usually one sentence that contains 15 words. Would like to classify if the sentence is talking about a specific subject.

  2. Is there a tool that given a sentence, it finds out the subject of a sentence.

I am using PHP and Java but the tool can be anything that runs on Linux command line

Thank you very much.


The most basic way of doing this is create a set of labeled training data and using it to train a classifier. How the classifier works is a more complicated issue- for spam filtering and many other things, just looking at the word frequency works pretty well.

Here is a basic example: http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex6/ex6.html

It is trivial to write a Naive Bayes classifier; a package like MALLET will also have this plus better machine learning methods. Lingpipe will also have this sort of stuff.

What you really should care about is the quality of data and what your features are. By quality of data I mean lots of data without that many borderline cases, and by features I mean are you choosing just words, or combinations of words (word ngrams), or dependency features, or something more complex. You need a way to create the feature data as well as actually do the learning! In this sense Lingpipe is good as you can do tokenization and all that first as opposed to writing your own functions to do this or having to cobble other tools together into your own feature generation code.

A guide to MALLET can be found here: http://courses.washington.edu/ling570/fei_fall10/11_15_Mallet.pdf


NLTK may solve problem.

i found below web service api handy and off the shelf to use...

http://text-processing.com/demo/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜