Feature Selection in MATLAB

2023-01-29 21:14 问答作者：

I have a dataset fo开发者_开发问答r text classification ready to be used in MATLAB. Each document is a vector in this dataset and the dimensionality of this vector is extremely high. In these cases peopl usually do some feature selection on the vectors like the ones that you have actually find the WEKA toolkit. Is there anything like that in MATLAB? if not can u suggest and algorithm for me to do it...? thanks

MATLAB (and its toolboxes) include a number of functions that deal with feature selection:

RANDFEATURES (Bioinformatics Toolbox): Generate randomized subset of features directed by a classifier
RANKFEATURES (Bioinformatics Toolbox): Rank features by class separability criteria
SEQUENTIALFS (Statistics Toolbox): Sequential feature selection
RELIEFF (Statistics Toolbox): Relief-F algorithm
TREEBAGGER.OOBPermutedVarDeltaError, predictorImportance (Statistics Toolbox): Using ensemble methods (bagged decision trees)

You can also find examples that demonstrates usage on real datasets:

Identifying Significant Features and Classifying Protein Profiles
Genetic Algorithm Search for Features in Mass Spectrometry Data

In addition, there exist third-party toolboxes:

Matlab Toolbox for Dimensionality Reduction
LIBGS: A MATLAB Package for Gene Selection

Otherwise you can always call your favorite functions from WEKA directly from MATLAB since it include a JVM...

Feature selection depends on the specific task you want to do on the text data.

One of the simplest and crudest method is to use Principal component analysis (PCA) to reduce the dimensions of the data. This reduced dimensional data can be used directly as features for classification.

See the tutorial on using PCA here:

http://matlabdatamining.blogspot.com/2010/02/principal-components-analysis.html

Here is the link to Matlab PCA command help:

http://www.mathworks.com/help/toolbox/stats/princomp.html

Using the obtained features, the well known Support Vector Machines (SVM) can be used for classification.

http://www.mathworks.com/help/toolbox/bioinfo/ref/svmclassify.html http://www.autonlab.org/tutorials/svm.html

You might consider using the independent features technique of Weiss and Kulikowski to quickly eliminate variables which are obviously unimformative:

http://matlabdatamining.blogspot.com/2006/12/feature-selection-phase-1-eliminate.html

继续阅读：classification

Feature Selection in MATLAB

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？