What is a data structure for quickly finding non-empty intersections of a list of sets?

2022-12-26 02:22 问答作者：

I have a set of N items, which are sets of integers, let's assume it's ordered and call it I[1..N]. Given a candidate set, I need to find the subset of I which have non-empty intersections with the candidate.

So, for example, if:

I = [{1,2}, {2,3}, {4,5}]

I'm looking to define valid_items(items, candidate), such that:

valid_items(I, {1}) == {1}
valid_items(I, {2}) == {1, 2}
valid_items(I, {3,4}) == {2, 3}

I'm trying to optimize for one given set I and a variable candidate sets. Currently I am doing th开发者_Go百科is by caching items_containing[n] = {the sets which contain n}. In the above example, that would be:

items_containing = [{}, {1}, {1,2}, {2}, {3}, {3}]

That is, 0 is contained in no items, 1 is contained in item 1, 2 is contained in itmes 1 and 2, 2 is contained in item 2, 3 is contained in item 2, and 4 and 5 are contained in item 3.

That way, I can define valid_items(I, candidate) = union(items_containing[n] for n in candidate).

Is there any more efficient data structure (of a reasonable size) for caching the result of this union? The obvious example of space 2^N is not acceptable, but N or N*log(N) would be.

I think your current solution is optimal big-O wise, though there are micro-optimization techniques that could improve its actual performance. Such as using bitwise operations when merging the chosen set in item_containing set with the valid items set.

i.e. you store items_containing as this:

items_containing = [0x0000, 0x0001, 0x0011, 0x0010, 0x0100, 0x0100]

and your valid_items can use bit-wise OR to merge like this:

int valid_items(Set I, Set candidate) {
    // if you need more than 32-items, use int[] for valid 
    // and int[][] for items_containing
    int valid = 0x0000;
    for (int item : candidate) {
        // bit-wise OR
        valid |= items_containing[item];
    }
    return valid;
}

but they don't really change the Big-O performance.

One representation that might help is storing the sets I as vectors V of size n whose entries V(i) are 0 when i is not in V and positive otherwise. Then to take the intersection of two vectors you multiply the terms, and to take the union you add the terms.

继续阅读：data-structures mathematical-optimization optimization set-theory

What is a data structure for quickly finding non-empty intersections of a list of sets?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？