SVM Classification - minimum number of input sets for each class

2022-12-20 06:42 问答作者：

I'm trying to build an app to detect images which are advertisements from the webpages. Once I detect those I`ll not be allowing those to be displayed on the client side.

From the help that I got on this Stackoverflow question, I thought SVM is the best approach to my aim.

So, I have coded SVM and an SMO myself. The dataset which I have got from UCI data repository has 3280 instances ( Link to Dataset ) where around 400 of them are from class representing Advertisement images and rest of them representi开发者_运维百科ng non-advertisement images.

Right now I'm taking the first 2800 input sets and training the SVM. But after looking at the accuracy rate I realised that most of those 2800 input sets are from non-advertisement image class. So I`m getting very good accuracy for that class.

So what can I do here? About how many input set shall I give to SVM to train and how many of them for each class?

Thanks. Cheers. ( Basically made a new question because the context was different from my previous question. Optimization of Neural Network input data )

Thanks for the reply. I want to check whether I`m deriving the C values for ad and non-ad class correctly or not. Please give me feedback on this.

SVM Classification - minimum number of input sets for each class

Or you u can see the doc version here.

You can see graph of y1 eqaul to y2 here

SVM Classification - minimum number of input sets for each class

and y1 not equal to y2 here

SVM Classification - minimum number of input sets for each class

There are two ways of going about this. One would be to balance the training data so it includes an equal number of advertisement and non-advertisement images. This could be done by either oversampling the 400 advertisement images or undersampling the thousands of non-advertisement images. Since training time can increase dramatically with the number of data points used, you should probably first try undersampling the non-advertisement images and create a training set with the 400 ad images and 400 randomly selected non-advertisements.

The other solution would be to use a weighted SVM so that margin errors for the ad images are weighted more heavily than those for non-ads, for the package libSVM this is done with the -wi flag. From your description of the data, you could try weighing the ad images about 7 times more heavily than the non-ads.

The required size of your training set depends on the sparseness of the feature space. As far as I can see, you are not discussing what image features you have chose to use. Before you can train, you need to to convert each image into a vector of numbers (features) that describe the image, hopefully capturing the aspects that you care about.

Oh, and unless you are reimplementing SVM for sport, I'd recomment just using libsvm,

继续阅读：classification machine-learning svm training-data

SVM Classification - minimum number of input sets for each class

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？