Minimum number of training examples for Find-S/Candidate Elimination algorithms?
Consider the instance space consisting of integer points in the x, y plane, wher开发者_JAVA技巧e 0 ≤ x, y ≤ 10, and the set of hypotheses consisting of rectangles (i.e. being of the form (a ≤ x ≤ b, c ≤ y ≤ d), where 0 ≤ a, b, c, d ≤ 10).
What is the smallest number of training examples one needs to provide so that the Find-S algorithm perfectly learns a particular target concept (e.g. (2 ≤ x ≤ 4, 6 ≤ y ≤ 9))? When can we say that the target concept is exactly learned in the case of the Find-S algorithm, and what is the optimal query strategy?
I'd also like to know the answer w.r.t Candidate Elimination.
Thanks in advance.
You need two positive examples: (2,6) (2 <= x <= 2, 6 <= y <= 6) and then (4,9) (2 <= x <= 4, 6 <= y <= 9) That is the S set done and this is the end of the answer to teaching/learning with FIND-S
With Candidate elimination, we need to give negative examples to build the G set. We need four negative examples to define the four boundaries of the rectangle:
- G starts as (-Inf <= x <= Inf, -Inf <= y <= Inf)
Add (3,5)- and we get hypothesis:
- (-Inf <= x <= Inf, 6 <= y <= Inf)
Add (3,10)-
- (-Inf <= x <= Inf, 6 <= y <= 9)
Add (1,7)-
- (2 <= x <= Inf, 6 <= y <= 9)
Add (5,7)-
- (2 <= x <= 4, 6 <= y <= 9)
So now S=G={(2 <= x <= 4, 6 <= y <= 9)}. As S=G, it has perfectly learned the concept. I have seen this question in different formats. Replace -Inf with 0 and Inf with 10 if it specifies the problem domain as such.
This is the optimal order to feed in the training examples. The worst order is to do the G set first, as you will create four different candidate hypotheses, which will merge to three with the second example and then merge to one with the 3rd example. It is useful to illustrate C-E with a tree as in the Mitchell book, and perhaps sketch the hypothesis graph next to each.
This answer is confirmed here: http://ssdi.di.fct.unl.pt/scl/docs/exercises/Clemens%20Dubslaff%20hm4.pdf
Assuming all ranges are a ≤ x ≤ b
and a
and b
are integer then...
In a 1 dimensional case (only x) there would be 4 samples, (a-1,a,b,b+1) that would prove it.
If you extend that to 2 dimensions (x and y) it should be 16 samples, which are those above as x, and (c-1,c,d,d+1) for y, with all possible combinations.
Please correct me if I don't understand the problem.
精彩评论