Twitter Subjectivity Training Sets

2023-03-25 15:15 问答作者：

I need a reliable and accurate method to filter twe开发者_如何转开发ets as subjective or objective. In other words I need to build a filter in something like Weka using a training set.

Are there any training sets available which could be used as a subjective/objective classifier for Twitter messages or other domains which may be transferable?

For research and non-profit purposes, SentiWordNet gives you exactly what you want. A commercial license is available too.

SentiWordNet : http://sentiwordnet.isti.cnr.it/

Sample Jave Code: http://sentiwordnet.isti.cnr.it/code/SWN3.java

Related Paper: http://nmis.isti.cnr.it/sebastiani/Publications/LREC10.pdf

The other approach I would try:

Example

Tweet 1: @xyz u should see the dark knight. Its awesme.

1) First a dictionary lookup for the for meanings.

"u" and "awesme" will not return anything.

2) Then go against the known abbreviations/shorthands and substitute matches with the expansions (Some resources: netlingo http://www.netlingo.com/acronyms.php or smsdictionary http://www.smsdictionary.co.uk/abbreviations)

Now the original tweet will look like:

Tweet 1: @xyz you should see the dark knight. Its awesme.

3) Then feed the remaining words in the spell checker and substitute with the best match (not always ideal and error prone for small words)

Related Link: Looking for Java spell checker library

Now the original tweet will look like:

Tweet 1: @xyz you should see the dark knight. Its awesome.

4) Split and feed the tweet into SWN3, aggregate the result

The problem with this approach is that

a) Negations should be handled outside SWN3.

b) Information in emoticons and exaggerated punctuations will be lost or they need to be handled separately.

There is sentiment training data at CMU somewhere. I can't remember the link. CMU has done a lot on twitter and sentiment analysis:

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series
Carnegie Mellon Study of Twitter Sentiments Yields Results Similar to Public Opinion Polls

I wrote an english vs. not english Naive Bayes classifier for twitter and made a ~example dev/test set and it was 98% accurate. I think that sort of thing is always pretty good if you are just trying to understand the problem, but a package like SentiWordNet might give you a head start.

The problem is defining what makes a tweet subjective or objective! It's important to understand that machine learning is less about the algorithm and more about the quality of the data.

You mention 75% accuracy is all you need.... what about recall? If you provide the right training data you might be able to get that, at the cost of lower recall.

The DynamicLMClassifier in LingPipe works pretty good.

http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html

继续阅读：classification text training-data twitter

Twitter Subjectivity Training Sets

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？