开发者

Multi-label classification done right?

Let's say I have a dataset, which can be neatly classified using weka's J48 or randomForest in R. Now let's say I have an other train开发者_Python百科ing file, which contains two classifications per datapoint.

How could I combine these two to be able to classify new data points into these two classes?

(So I'd need a "two-pass" training.)

Should I use a MLP (like a restricted Bolzmann machine) instead?


I'm assuming your two data sets look like this...

Data set 1:

(x_11, x_12, ... , x_1N) = 1
(x_21, x_22, ... , x_2N) = 0
....

Data set 2:

(x_11, x_12, ... , x_1N) = (1, 1)
(x_21, x_22, ... , x_2N) = (0, 1)
....

Assuming that is what your problem looks like, I would split it into two problems: that of predicting the two different labels. I think this can be justified by the probability formula:

p(L1,L2|X) = p(L2|L1,X)p(L1|X)

where the L1 and L2 are the two class labels and X is the data.

My suggestion is to train a model for p(L1|X) using datasets 1 and 2 and L1 as your target variable and then train a model of p(L2|L1,X) using dataset 2 and L1, with L2 as your target variable. To predict a new pair of labels, you apply the first model to get an estimate of L1 and then the second model using the estimate of L1 to get an estimate of L2.

I suppose an argument against this approach is that, although the formula is true, it may be the case that p(L1,L2|X) is easier to learn than p(L2|L1,X) and p(L1|X). However, in the absence of more details I really don't know.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜