开发者

Doing a hierarchical sentiment analysis with LingPipe

This is in the context of doing sentiment analysis using LingPipe machine learning tool. I have to classify if a sentence in a big paragraph has a positive/negative sentiment. I know of the following approach in LingPipe

  1. Classify if the complete paragraph based on its polarity - negative or positive.

    Here, I yet don't know the polarity at the sentence level. We are still at the paragraph level. How do I determine the polarity at the sentence level of a paragraph, of whether a sentence in a paragraph is a positive/negative s开发者_JAVA百科entence? I know that LingPipe is capable of classifying if a sentence is subjective/objective. So using this approach,,,,

    ,,,, should I

  2. First train LingPipe on a large set of sentences that are subjective/objective.

  3. Use the trained model to extract all subjective sentences out of a test paragraph.
  4. Train a LingPipe classifier based on the extracted subjective sentences for polarity by manually labeling them as positive/negative.
  5. Now used the trained polarity model and feed a test subjective sentence (that is done by passing a sentence through the trained subjective/objective) model, and then determine if the statement is positive/negative?

    Does the above approach work? In the above proposed approach, we know that LingPipe is capable of accepting a large textual content (paragraph) for polarity classification. Will it do a good job if we just pass a single subjective sentence for polarity classification? I am confused!


You might want to take a look at the multi-level analysis approaches in the literature, e.g.

Li, S., et al. (2010). "Exploiting Combined Multi-level Model for Document Sentiment Analysis," 2010 International Conference on Pattern Recognition.

Yessenalina, A., et al. (2010). "Multi-level Structured Models for Document-level Sentiment Classification," Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1046–1056,MIT, Massachusetts, USA, 9-11 October 2010.

Multi-level analysis approaches are quite common in information retrieval, as in content indexing for vector space similarity search.

Environments such as Ling Pipe are a good way to get started but eventually you need to employ lower level, finer grained tools such as yura suggested.


Most machine leraning libraries including lingpipe are row based(object with planar features) . So if you want do some hierarchical classification with it you should denormolize you data. for example you can have features of paragrahp and sentence at same feature set. If you use by word only clasification you can create such features PARGRAPH_WORDX=true, SENTENCE_WORDX=true. Some other toolkits allow you to express you model withot denormalisation, it is so called graphical models exampels are CRF, ACRF, Markov Models etc implementation of those you can find in mallet and Factorie.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜