How to rank main features after Feature Selection on OneHotEncoded data?
Consider having a dataset with Categorical-nominal feature types and one numerical output variable. Feature selection algorithms like InfoGain, Pearson or wrapper ones only accept numerical features as input, so i have to OneHotEncode non-ordinal data and it produce lots of dummy features.
If I apply a feature selection in python and get the rank of features, how to retrieve main features(before onehot encoding) ranking?
For example if 3 features and their categories are A(1,2, 3) - B(1,2,3) - C(1,2,3) and result o开发者_开发百科f Pearson (with SelectKBest) on dummy features become:
- B2
- A1
- B1
- C3
- A3
Is this ranking correct to say:
- B
- A
- C
Since i have to rank features with at least 6 methods of feature selection, i really appreciate any guidance on nominal feature selection and ranking.
I did research, but lots of resources did: ordinal encoding or doesn't show an implementation on pure nominal input and seems other encoding methods like Base N encoding, Hash encoding or dummy encoding still make more features out of one nominal variable.
精彩评论