开发者

Find best attribute in decision tree

i've come across one question

Color   Flavor  Edibility
Red     Grape      Yes
Red     Cherry     Yes
Green   Grape      Yes
Green   Cherry     No
Blue    Grape      No
Blue    Cherry     No

In this question, it says just analyzing without any calculation, guess the best attribute (either Colo开发者_运维百科r or Flavor)

Can someone explain how to guess this without calculating the entropy and so on


I know this question is kind of older, but in case you're still interested: In general, a shorter, wider tree would be "better." Consider the fact that it will take additional decisions to reach a node in a tall tree.

What you really have to look at is the entropy and gain at each inner decision node.

Entropy is the amount of uncertainty or randomness with a particular variable. Thought about another way, it is the measure of how homogenous the training samples at a particular node are. For example, consider a classifier with two classes, YES and NO (true or false in your case). If a particular variable or attribute, say x has three training examples of class YES and three training examples of class NO (for a total of six), the entropy would be 1. This is because there is an equal number of both classes for this variable and is the most "mixed up" you can get. Likewise, if x had all six training examples of a particular class, say YES, then entropy would be 0 because this particular variable would be pure, thus making it a leaf node in our decision tree.

Entropy may be calculated in the following way:

Find best attribute in decision tree


(source: dms.irb.hr)

Now consider gain. Note that each level of the decision tree, we choose the attribute that presents the best gain for that node. The gain is simply the expected reduction in the entropy achieved by learning the state of the random variable x. Gain is also known as Kullback-Leibler divergence. Gain can be calculated in the following way:

Find best attribute in decision tree


(source: dms.irb.hr)

While the questions asks for you not to calculate either gain nor entropy, the explanation was necessary to show why I choose a particular attribute. In your case, I will assume that edibility is the learned attribute.

If you choose either flavor or color, notice that you have an entropy of 1 [0-1] in both cases because you have an equal number of training instances with an edibility of "yes" and "no" regardless of the attribute. At this point, you should look at gain. If you anchor your tree with the attribute "color", you will have less entropy as the proportion of each attribute belonging to the set S would be less. For example, notice that the leaf nodes for "Red" and "Green" are already pure, with all "yes" and "no" respectively. From that point on, you have one attribute left to use, flavor. Obviously, if you had more than one left, you would have to calculate gain for each attribute to figure out which does the best and use that as the next "layer"

Also, try drawing it out and anchoring the tree with the Color attribute and calculating the gain - you'll find you converge to your answer (a pure node) faster.


Entropy and gains are definitely better heuristics from Information theory that help in selecting the best feature. However, one way to choose best attribute which is less accurate than entropy but works well is a 1 step lookahead which works as follows:

CHOOSEBESTATTRIBUTE(S)
choose j to minimize J, computed as follows:
    S0 = all <x,y> in S with xj = O;
    S1 = all <x,y> in S with xj = 1;
    Y0 = the most common value of y in S0
    Y1 = the most common value of y in S1
    J0 = number of examples <x, y> in S0 with y not equal to y0
    J1 = number of examples (x, y) in S1 with y not equal to y1
    Ji = J0 + J1 (total errors if we split on this feature)
return j

Source: Machine Learning: Tom M. Mitchell, Chapter 3

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜