Find best attribute in decision tree
i've come across one question
Color Flavor Edibility
Red Grape Yes
Red Cherry Yes
Green Grape Yes
Green Cherry No
Blue Grape No
Blue Cherry No
In this question, it says just analyzing without any calculation, guess the best attribute (either Colo开发者_运维百科r or Flavor)
Can someone explain how to guess this without calculating the entropy and so on
I know this question is kind of older, but in case you're still interested: In general, a shorter, wider tree would be "better." Consider the fact that it will take additional decisions to reach a node in a tall tree.
What you really have to look at is the entropy and gain at each inner decision node.
Entropy is the amount of uncertainty or randomness with a particular variable. Thought about another way, it is the measure of how homogenous the training samples at a particular node are. For example, consider a classifier with two classes, YES and NO (true or false in your case). If a particular variable or attribute, say x has three training examples of class YES and three training examples of class NO (for a total of six), the entropy would be 1. This is because there is an equal number of both classes for this variable and is the most "mixed up" you can get. Likewise, if x had all six training examples of a particular class, say YES, then entropy would be 0 because this particular variable would be pure, thus making it a leaf node in our decision tree.
Entropy may be calculated in the following way:
(source: dms.irb.hr)
Now consider gain. Note that each level of the decision tree, we choose the attribute that presents the best gain for that node. The gain is simply the expected reduction in the entropy achieved by learning the state of the random variable x. Gain is also known as Kullback-Leibler divergence. Gain can be calculated in the following way:
(source: dms.irb.hr)
While the questions asks for you not to calculate either gain nor entropy, the explanation was necessary to show why I choose a particular attribute. In your case, I will assume that edibility is the learned attribute.
If you choose either flavor or color, notice that you have an entropy of 1 [0-1] in both cases because you have an equal number of training instances with an edibility of "yes" and "no" regardless of the attribute. At this point, you should look at gain. If you anchor your tree with the attribute "color", you will have less entropy as the proportion of each attribute belonging to the set S would be less. For example, notice that the leaf nodes for "Red" and "Green" are already pure, with all "yes" and "no" respectively. From that point on, you have one attribute left to use, flavor. Obviously, if you had more than one left, you would have to calculate gain for each attribute to figure out which does the best and use that as the next "layer"
Also, try drawing it out and anchoring the tree with the Color attribute and calculating the gain - you'll find you converge to your answer (a pure node) faster.
Entropy and gains are definitely better heuristics from Information theory that help in selecting the best feature. However, one way to choose best attribute which is less accurate than entropy but works well is a 1 step lookahead which works as follows:
CHOOSEBESTATTRIBUTE(S)
choose j to minimize J, computed as follows:
S0 = all <x,y> in S with xj = O;
S1 = all <x,y> in S with xj = 1;
Y0 = the most common value of y in S0
Y1 = the most common value of y in S1
J0 = number of examples <x, y> in S0 with y not equal to y0
J1 = number of examples (x, y) in S1 with y not equal to y1
Ji = J0 + J1 (total errors if we split on this feature)
return j
Source: Machine Learning: Tom M. Mitchell, Chapter 3
精彩评论