svmlib scaling vs. pyml normalization, scaling, and translation

2023-03-04 12:27 问答作者：

What is the proper way to normalize feature vectors for use in a linear-kernel SVM?

Looking at LIBSVM, it looks like it's done by just rescaling each feature to a single standard upper/lower range. However, it doesn开发者_StackOverflow社区't seem like PyML provides a way to scale the data this way. Instead, there are options to normalize the vectors by their length, shift each feature value by its mean while rescaling by the standard deviation, etc.

I am dealing with a case when most features are binary, except a few that are numeric.

I am not an expert in this, but I believe centering and scaling each feature vector by subtracting its mean and dividing thereafter by the standard deviation is a typical way to normalize feature vectors for use with SVMs. In R, this can be done with the scale function.

Another way is to transform each feature vector to the [0,1] range:

(x - min(x)) / (max(x) - min(x))

Maybe some features could benefit from a log-transformation if the distribution is very scewed, but this would change the shape of the distribution as well and not only "move" it.

I am not sure what you gain in an SVM-setting by normalizing the vectors by their L1 or L2 norm like PyML does with its normalize method. I guess binary features (0 or 1) don't need to be normalized.

继续阅读：libsvm normalize pyml svm

svmlib scaling vs. pyml normalization, scaling, and translation

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？