Algorithm to find discriminating data points?
Given n samples and p >> n (discrete) data points for each of the n samples, what is a good algorithm for finding a smallest possible set of k data points such that those k data points discriminate between all n samples?
For my purposes, a good开发者_StackOverflow algorithm that finds an approximately smallest set would also suffice.
It sounds as though your problem is closely related to the test cover problem. The test cover problem is, given a ground set X = {1, …, n} and a collection T = {T1, …, Tm} of subsets of X, to find the smallest subcollection U of T such that for all y ≠ z in X, there exists a set S in T such that either (x in S and y not in S) or (x not in S and y in S).
The test cover problem is NP-hard, so in practice, optimal solutions are found using branch and bound techniques. See De Bontridder et al.
Here is a simple greedy algorithm, shouldn't generate too bad results:
Check if data points are same for two different elements, if so, there is no solution.
- In each step we add one new data point to the set
k
. - We test all the different points in all of the
p
inn
.- Try to add that point to
k
. - The new
k
dividesn
into a couple of distinct sets (some of these contain just one element, some more.. finally all will contain just one). - Pick the point which generates the most sets.
- Try to add that point to
- Do this till all sets are distinct.
精彩评论