Algorithm to find common subsets
I have N number of sets Si of Numbers, each of a different size. Let m1, m2, ... mn be the sizes of respective sets (mi = | Si |), and M be the size of the largest set. I have to find common subsets that have at least two numbers in them. Example:
Set Items
1 10,80,22
2 72, 10, 80, 26,50
3 80,
4 10, 22
5 22, 72, 10, 80, 26,50
开发者_如何转开发
So the result will be like that
Items Found in sets
10, 22 1, 4
10, 80 1, 2, 5
10, 80, 22 1, 5
10, 72, 80, 26, 50 2, 5
So how to automate this problem and what is the expected complexity for respective solution? I need it to be as fast as possible.
Since the original question asks to make a discussion as fast as possible, here's how it can be made short.
First, discuss whether the output is exponential to your input. Assume you have 2 elements, and N sets. Assume that each element belongs to each set; it will require N lines as your input. Then, how many lines should you print?
I think that you're going to print 2N-N-1 lines, like these:
1,2 1,2
1,2 1,3
.....
1,2 1,N
1,2 2,1
.....
1,2 1,2,3
.....
1,2 1,2,3,...N
Since you're going to spend O(2N) time printing, you could pick any of the suggestions on this page to calculate this information—you've been caught by an exponent anyway.
This argument would make your discussion very fast.
You could do this -
Create a hash table & insert as keys each of your item & values as the sets they belong to.
Eg: 10:[0,1,3,4] 80:[0,1,2,4] 22:[0,3,4] 72:[1,4] 26:[1,4] 50:[1,4]
After this for each 2 elements in the hash table, find the intersection of the values of the keys. Here intersection of (10, 80) gives [0,3,4]. Do this for (10,22)(10,72)... You have your result.
Complexity - To insert M items into the Hash Table - O(M) Search each key in the hash table - O(1) Intersection operation of key's value's - O(m^2) [This could be optimized]
On the whole I would say this is O(m^2) algo. But if the size of "Items" is not large in each "Set", then you wouldn't notice much performance hit.
You're being asked to do the pairwise intersection of all your set and then collect all results that are size >= 2.
There are O(N^2) pairs. Doing the intersection is O(M) for each. Assemble all the results, sort them by set contents to work out the duplicates is N^2 Log N^2 (worst case is that the intersection is different for every pair so there can be O(N^2) result sets)
So I think the complexity is O((N^2 + N log N) * M)
But there's quite possible an error somewhere.
Paul
One solution I can think of:
Use a hash table, generate all the combination of "a pair of numbers" of the numbers in that "row", which is O(M * M) time.
Then use those as the hash keys, mapping to the main array index.
For each row of those N elements,
do the step above... and if the hash already maps to a number, then it is a match,
then return the pair, or else just put those in the hash
it is O(N * M * M)
update: if, as you say in the comment, that M can be maximum 15, then O(N * M * M) is really just the same as O(N).
If your initial question is to find all pairs, then
For each row of those N elements,
do the step first mentioned... and if the hash already maps to a number,
then it is a match and just print out the pair if this pair is not printed yet
and in any event, put the mapping into the hash
to tell whether a pair has been printed, created another hash, such as
printed_or_not[i,j] = true or false,
and make sure to check printed_or_not[i,j] or printed_or_not[j, i]
because not need to print again if just different order
I have some tips. You should be sort all items in sets, when you loop in loop for counting a number in that set. You should be add condition for check a number is greater than your search value and use break; for end of loop.
精彩评论