Query on decision trees
I'm trying to understand a para in my AI textbook, and need help with this.
Essentially, my question is why are there 2^(2^n) functions on n attributes if it takes 2^n bits to define a function?
Here is the para from the text (source: AI: A Modern Approach, Stuart Russell and Peter Norvig):
Decision Trees are good for some kinds of functions and bad for others. Is there any kind of representation that is efficient for all kinds of functions? Unfortunately, no. We can show this in a very general way. Consider the set of all Boolean functions on n attributes. How many different functions are in this set? This is just the number of different truth tables that we can write down, because the function is defined by its truth table. The truth table has 2^n rows, because each input case is described by n attributes. We can consider the 'answer' column of the table as a 2^n-bit number that defines th开发者_运维百科e function. No matter what representation we use for functions, some of the functions (almost all of them, in fact) are going to require at least that many bits to represent.
If it takes 2^n bits to define the function, then there are 2^(2^n) different functions on n attributes.
A second question is: Why do we need 2^n bit number (see bold above), I thought we'd need n bit number only, for example if we have 3 attributes, we can define 2^3=8 functions, thus needing only 3 bits to define all 8 functions (000, 001, 010, 011, etc).
i've been thinking about this for awhile, not sure what eludes me, thank you for your time in looking into this!
What's he's saying is this: you can write out all possible values for n attributes as:
0 1 2 .. n
0 0 0 0 0 0 0 1
clearly the number of rows is 2^n
Now we define a function by adding an extra column. If the bit is 1, then that value is "true" in that function, otherwise it is false. Since the number of rows is 2^n, and we are defining the function by all combinations of 0's & 1's in a binary string, clearly there are 2^(2^n) such strings, so there are 2^(2^n) functions on n attributes.
Take a simple example: n = 3. The values of the attributes are:
000 001 010 011 100 101 110 111
Now, we can define one function f that is "true" for rows 1 and 2, and "false" for every other row. We could represent this as row1="true" row2="true" row3="false" ...etc. Now, how many different strings like this could we get? we could write out
000000000 000000001 000000010 .. 111111111
Each one of these strings maps to a different function, and there are 2^8 = 2^(2^n) of these, hence 2^(2^n) functions.
I think I get it, and I think there might be a mistake in your answer...
Let me explain according to my understanding of your example for 3 attributes..
n = 3
Row 1 000
Row 2 001
Row 3 010
...
Row 8 111
Function 0 : False for every row therefore 0 0 0 0 0 0 0 0 (8 '0's as there are 8 rows)
Function 1: True for row 1, false for the rest: 00000001
Function 2: True for row 2, false for the rest: 00000010
...
Thus there are 2^8 functions, which is 2^(2^3) i.e. 2^(2^n).
Correct?
There are 2^n rows in the truth table. In the "answer" column we thus have 2^n slots for our function outputs. For each of the 2^n slots we have 2 choices. This is a binary string of length 2^n. The number of ways this string can be formed is 2^(k) where k is the number of slots available, in the example k=2^n so 2^(2^n) is the number of Boolean functions we can form over n attributes.
As an aside, the reason why he says that you will need at least that many bits to represent the function is that each function must include enough information to specify whether a particular row of inputsis "true" or "false" for that function.
From an information-theoretic perspective, it is hard to see how this can be done in fewer than 2^(2^n) bits -- or more simply, than m bits, where m is the number of functions. Because, suppose we had actually written out all of those functions. Which one are we using? We have to specify it's id which takes m bits.
精彩评论