Algorithm to find common subsets

2023-01-20 06:31 问答作者：

I have N number of sets S_i of Numbers, each of a different size. Let m₁, m₂, ... m_n be the sizes of respective sets (m_i = | S_i |), and M be the size of the largest set. I have to find common subsets that have at least two numbers in them. Example:

Set  Items
1    10,80,22
2    72, 10, 80, 26,50
3    80,
4    10, 22
5    22, 72, 10, 80, 26,50

开发者_如何转开发

So the result will be like that

Items                Found in sets
10, 22               1, 4
10, 80               1, 2, 5
10, 80, 22           1, 5
10, 72, 80, 26, 50   2, 5

So how to automate this problem and what is the expected complexity for respective solution? I need it to be as fast as possible.

Since the original question asks to make a discussion as fast as possible, here's how it can be made short.

First, discuss whether the output is exponential to your input. Assume you have 2 elements, and N sets. Assume that each element belongs to each set; it will require N lines as your input. Then, how many lines should you print?

I think that you're going to print 2^N-N-1 lines, like these:

1,2     1,2
1,2     1,3
.....
1,2     1,N
1,2     2,1
.....
1,2     1,2,3
.....
1,2     1,2,3,...N

Since you're going to spend O(2^N) time printing, you could pick any of the suggestions on this page to calculate this information—you've been caught by an exponent anyway.

This argument would make your discussion very fast.

You could do this -

Create a hash table & insert as keys each of your item & values as the sets they belong to. Eg: 10:[0,1,3,4] 80:[0,1,2,4] 22:[0,3,4] 72:[1,4] 26:[1,4] 50:[1,4]
After this for each 2 elements in the hash table, find the intersection of the values of the keys. Here intersection of (10, 80) gives [0,3,4]. Do this for (10,22)(10,72)... You have your result.
Complexity - To insert M items into the Hash Table - O(M) Search each key in the hash table - O(1) Intersection operation of key's value's - O(m^2) [This could be optimized]

On the whole I would say this is O(m^2) algo. But if the size of "Items" is not large in each "Set", then you wouldn't notice much performance hit.

You're being asked to do the pairwise intersection of all your set and then collect all results that are size >= 2.

There are O(N^2) pairs. Doing the intersection is O(M) for each. Assemble all the results, sort them by set contents to work out the duplicates is N^2 Log N^2 (worst case is that the intersection is different for every pair so there can be O(N^2) result sets)

So I think the complexity is O((N^2 + N log N) * M)

But there's quite possible an error somewhere.

Paul

One solution I can think of:

Use a hash table, generate all the combination of "a pair of numbers" of the numbers in that "row", which is O(M * M) time.

Then use those as the hash keys, mapping to the main array index.

For each row of those N elements,
  do the step above... and if the hash already maps to a number, then it is a match,
    then return the pair, or else just put those in the hash

it is O(N * M * M)

update: if, as you say in the comment, that M can be maximum 15, then O(N * M * M) is really just the same as O(N).

If your initial question is to find all pairs, then

For each row of those N elements,
  do the step first mentioned... and if the hash already maps to a number, 
    then it is a match and just print out the pair if this pair is not printed yet
  and in any event, put the mapping into the hash

to tell whether a pair has been printed, created another hash, such as
  printed_or_not[i,j] = true or false, 
    and make sure to check printed_or_not[i,j] or printed_or_not[j, i]
    because not need to print again if just different order

I have some tips. You should be sort all items in sets, when you loop in loop for counting a number in that set. You should be add condition for check a number is greater than your search value and use break; for end of loop.

继续阅读：algorithm

Algorithm to find common subsets

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？