开发者

The Role of the Training & Tests Sets in Building a Decision Tree and Using it to Classify

I've been working weka for couple of months now. Currently, I'm working on my machine learning course here in Ostfold University College. I need a better way to cons开发者_如何学运维truct a decision tree based on separated training and test sets. Anybody come up with good idea can be of very great relief. Thanx in advance.

-Neo


You might be asking for something more specific, but in general:

You build the decision tree with the training set, and you evaluate the performance of that tree using the test set. In other words, on the test data, you call a function usually named something like c*lassify*, passing in the newly-built tree and a data point (within your test set) you wish to classify.

This function returns the leaf (terminal) node from your tree to which that data point belongs--and assuming that the contents of that leaf is homogeneous (populated with data from a single class, not a mixture) then you have in essence assigned a class label to that data point. When you compare that class label assigned by the tree to the data point's actual class label, and repeat for all instances in your test set, you have a metric to evaluate the performance of your tree.

A rule of thumb: shuffle your data, then assign 90% to the training set and the other 10% to a test set.


actually i was looking for something like this - http://weka.wikispaces.com/Saving+and+loading+models to save a model, load it and use it in the training set. This is exactly what i was searching for. Hope it might be useful for anyone who had similar problem as mine. cheers -Neo182

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜