开发者

In-place C++ set intersection

The standard way of intersecting two sets in C++ is to do the following:

std::set<int> set_1;  // With some elements
std::set<int> set_2;  // With some other elements
std::set<int> the_intersection;  // Destination of intersect
std:开发者_如何学JAVA:set_intersection(set_1.begin(), set_1.end(), set_2.begin(), set_2.end(), std::inserter(the_intersection, the_intersection.end()));

How would I go about doing an in-place set intersection? That is, I want set_1 to have the results of the call to set_intersection. Obviously, I can just do a set_1.swap(the_intersection), but this is a lot less efficient than intersecting in-place.


I think I've got it:

std::set<int>::iterator it1 = set_1.begin();
std::set<int>::iterator it2 = set_2.begin();
while ( (it1 != set_1.end()) && (it2 != set_2.end()) ) {
    if (*it1 < *it2) {
        set_1.erase(it1++);
    } else if (*it2 < *it1) {
        ++it2;
    } else { // *it1 == *it2
            ++it1;
            ++it2;
    }
}
// Anything left in set_1 from here on did not appear in set_2,
// so we remove it.
set_1.erase(it1, set_1.end());

Anyone see any problems? Seems to be O(n) on the size of the two sets. According to cplusplus.com, std::set erase(position) is amortized constant while erase(first,last) is O(log n).


You can easily go through set_1, check each element to see if it exists in set_2, and erase it if it doesn't. Since sets are sorted, you can compare them in linear time, and erasing an element using an iterator is amortized constant time. I wouldn't count on it being more efficient than what you started with though, benchmarking would be wise if it matters to you.


It's not directly answers the question, but maybe someone find this helpful.

In case of std::vector it is not safe to use standard algorithm with set_1.begin() as output iterator (see below), while clang/gcc/microsoft implementations would work. Note, set_2 could be anything, not just a std::vector.

std::vector<int> set_1;  // With some elements
std::vector<int> set_2;  // With some other elements
auto end = std::set_intersection(
                     set_1.begin(), set_1.end(), 
                     set_2.begin(), set_2.end(), 
                     set_1.begin() // intersection is written in set_1
                    );
set_1.erase(end, set_1.end()); // erase redundant elements

Update:

Thanks to @Keith who found that C++ Standard (25.4.5.3) requires next:

The resulting range shall not overlap with either of the original ranges

So what I initially proposed was wrong, but working solution in major STL implementations. If you want to be on safe side and don't want extra allocations then copy implementation of your choice to you code base and use it instead of std::set_intersection. I don't really understand reasons for such restriction, please comment if you know the answer.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜