Number of sub-sequences in a given sequence
If I am given a sequence X = {x1,x2,....xm}
, then I will have (2^m)
subsequences.
Can anyone please explain how can I arrive at this formula intuitively?
I can start with 3 elements, then 4 and then 5 and arrive to this formula, 开发者_StackOverflowbut I don't think I understand. Where did the '2' come from? I am not dividing in half or anything here.
Thank-you for the help.
First of all, what you are talking about is called a set. Second, it is correct that the number of distinct sub-sets that can be generated out of a set is equal to 2^m where m is the number of elements in that set. We can arrive at this result if we take an example of 3 elements:
S = {a, b, c}
Now to generate every sub-set we can model the presence of an element using a binary digit:
xxx where x is either 0 or 1
Now lets enumerate all possibilities:
000 // empty sub-set
001
010
011
100
101
110
111 // the original set it self!
Lets take 011
as an example. The first digit is 0 then, a
is not in this subset, but b
and c
do exist because their respective binary digits are 1's. Now, given m(e.g 3 in the above example) binary digits, how many binary numbers(sub-sets) can be generated? You should answer this question by now ;)
To anyone who is actually looking for a substring (as the title or URL might lead you to believe):
Subset: 2^n (Order doesn't matter in sets)
Subsequence: 2^n (Since we keep the original ordering, this is the same.)
Substring: n(n+1) * 1/2 (Elements must be consecutive)
For each element in a sequence of length m
, you can either select it or leave it. Thus, there are 2 ways to deal with each element. Therefore, the total no. of ways to deal with all the m
elements is 2*2*2...... m
times = 2^m
times.
Where did the 2 come from? Every time you add one more element you double the number of possibilities.
The value x_i
can either be in the subsequence, or not. This is just like a bit. There are 2^m
combinations for turning on / turning off the m
numbers in the sequence.
For any sequence X = {x1,x2,....xm}, there will be (2^m) sub-sequences, because you can "choose" sub-sequences of length 0,1,2,...,m ,i.e., mathematically it is
"C(m,0) + C(m,1) + ... C(m,m)" which leads to 2^m.
For e.g., say the string is "abc", then
C(3,0) = 1, ""
C(3,1) = 3, "a", "b", "c"
C(3,2) = 3, "ab", "bc", "ac"
C(3,3) = 1, "abc"
number of subsequences are 8 i.e., 2^3.
For more details visit http://en.wikipedia.org/wiki/Binomial_coefficient#Series_involving_binomial_coefficients
If you have a sequence S, what happens when you add a new element x to the end of S?
All the subsequences of S are still subsequences of your new sequence. So are all those subsequences with x added at the end.
Voilà! Every time you add an element, you double the number of subsequences.
Each subsequence is defined by choosing between selecting or not selecting each of the m elements. As there are m elements, each with two possible states, you get 2^m possibilities.
To extend the accepted answer, the number of subsequences can be thought in mathematical terms. Let's take an example of a string: 'ABC'.
Number of subsequences of string 'ABC':
=> C(3, 0) + C(3, 1) + C(3, 2) + C(3, 3) = 1 + 3 + 3 + 1 = 8 (2^3).
(Note: C(m, n) stands for number of subsequences of size 'n' from a string of size 'm')
It can be easily verified by listing all of them:
C(3, 0) = '', // Taking 0 letters at a time out of the given string.
C(3, 1) = 'A', 'B', 'C', // Taking 1 letter at a time.
C(3, 2) = 'AB', 'AC', 'BC', // Taking 2 letters at a time.
C(3, 3) = 'ABC'. // Taking 3 letters at a time.
(Total count = 8)
We keep the sequence of letters same for a subsequence, hence, Combination instead of a Permutation.
Note: Sum of binomial coefficients, using binomial theorem = 2^n. Proof below:
From Binomial theorem,
(x + y)^n = C(n, 0) x^n y^0 + .... + C(n, n) x^0 y^n
Using x = 1, y = 1,
(1+1)^n = C(n, 0) 1^n 1^0 + .... + C(n, n) 1^0 1^n
=> 2^n = C(n,0) + C(n,1) + .... + C(n,n)
That's where the '2' comes from, from binomial theorem.
Every time you are at a character you either include it in your subsequence or you don't include it in your result, what that means is if you consider string as root of the tree then it will have two children one a subsequence that includes current character one that doesn't, that make it a binary tree. and number of nodes in a balanced binary tree is n^m where n is the branching factor and m is the height of the tree, we know that the branching factor is 2 and height would be number of characters in string, hence it's 2^M.
Each element is either in a subsequence or not. Therefore, starting with the first x1 there are two sets of subsets: those with x1 included and those without. Same can be done with the smaller sub-problem {x2,...,xm}. Therefore you finally yield 2^m.
Basically you will have twice as many subsequences for each new number since you'll have (2^(m-1))
"equivalent" subsequences that are shifted one space to the right (assuming horizontal ordering and adding in the right) plus the subsequence of all elements.
I started algorithm course very recently.
I guess the more intuitive way of thinking about the answer is to think of an example.
For example, we have A=(1,2,3) Then we may have 0 that's 1 way 1,2 or 3 that's 3 ways (1,2),(1,3),(2,3) that's 3 ways as well And (1,2,3) == that's 1 way again
Total subsequence is 2^3 or 8. So the Generic formula is mC0+mC1+mC2,mC3+......mCm
Please correct me if I'm wrong. I faced this problem minutes ago and this is how I thought about the intuitive formulation.
精彩评论