开发者

How can I return a copy of a vector containing elements not in a set?

Suppose I have the following two data structures:

std::vector<int> all_items;
std::set<int> bad_items;

The all_items vector contains all known items and the bad_items vector contains a list of bad items. These two data structures are populated entirely independent of one another.

What's the proper way to write a method that will return a std::vector<int> contain all elements of all_items not in bad_items?

Currently, I have a clunky solution that I think can be done more concisely. My understanding of STL function adapters is lacking. Hence the question. My current solution is:

struct is_item_bad {
    std::set<int> const* bad_items;
    bool operator() (int const i) const {
        return bad_items.count(i) > 0;
    }
};

std::vector<int> items() const {
    is_item_bad iib = { &bad_items; };
    std::vector<int> good_items(all_items.size());
    std::remove_copy_if(all_items.begin(),  all_items.end(), 
                        good_items.begin(), is_item_bad);
    return good_items; 
}

Assume all_items, bad_items, is_item_bad and items() are all a part of some containing class. Is there a way to write them items() getter such that:

  • It doesn't need temporary variables in the method?
  • It doesn't need the custom functor, struct is_item_bad?

I had hoped to just use the count method on std::set as a functor, but I haven't been able to divine the right way to express that w/ the remove_copy_if algorithm.

EDIT: Fixed the logic error in items(). The actual code didn't have the problem, it was a transcription error.

EDIT: I have accepted a solution that doesn't use std::set_difference since 开发者_运维问答it is more general and will work even if the std::vector isn't sorted. I chose to use the C++0x lambda expression syntax in my code. My final items() method looks like this:

std::vector<int> items() const {
    std::vector<int> good_items;
    good_items.reserve(all_items.size());
    std::remove_copy_if(all_items.begin(), all_items.end(),
                        std::back_inserter(good_items),
                        [&bad_items] (int const i) {
                            return bad_items.count(i) == 1;
                        });
}

On a vector of about 8 million items the above method runs in 3.1s. I bench marked the std::set_difference approach and it ran in approximately 2.1s. Thanks to everyone who supplied great answers.


As jeffamaphone suggested, if you can sort any input vectors, you can use std::set_difference which is efficient and less code:

#include <algorithm>
#include <set>
#include <vector>

std::vector<int> 
get_good_items( std::vector<int> const & all_items,
                std::set<int> const & bad_items )
{
    std::vector<int> good_items;

    // Assumes all_items is sorted.
    std::set_difference( all_items.begin(),
                         all_items.end(),
                         bad_items.begin(),
                         bad_items.end(),
                         std::back_inserter( good_items ) );

    return good_items;
}


Since your function is going to return a vector, you will have to make a new vector (i.e. copy elements) in any case. In which case, std::remove_copy_if is fine, but you should use it correctly:

#include <iostream>
#include <vector>
#include <set>
#include <iterator>
#include <algorithm>
#include <functional>
std::vector<int> filter(const std::vector<int>& all, const std::set<int>& bad)
{
        std::vector<int> result;
        remove_copy_if(all.begin(), all.end(), back_inserter(result),
                  [&bad](int i){return bad.count(i)==1;});
        return result;
}

int main()
{
        std::vector<int> all_items = {4,5,2,3,4,8,7,56,4,2,2,2,3};
        std::set<int> bad_items = {2,8,4};
        std::vector<int> filtered_items = filter(all_items, bad_items);
        copy(filtered_items.begin(), filtered_items.end(), std::ostream_iterator<int>(std::cout, " "));
        std::cout << std::endl;
}

To do this in C++98, I guess you could use mem_fun_ref and bind1st to turn set::count into a functor in-line, but there are issues with that (which resulted in deprecation of bind1st in C++0x) which means depending on your compiler, you might end up using std::tr1::bind anyway:

remove_copy_if(all.begin(), all.end(), back_inserter(result),
     bind(&std::set<int>::count, bad, std::tr1::placeholders::_1)); // or std::placeholders in C++0x

and in any case, an explicit function object would be more readable, I think:

struct IsMemberOf {
        const std::set<int>& bad;
        IsMemberOf(const std::set<int>& b) : bad(b) {}
        bool operator()(int i) const { return bad.count(i)==1;}
};
std::vector<int> filter(const std::vector<int>& all, const std::set<int>& bad)
{
        std::vector<int> result;
        remove_copy_if(all.begin(), all.end(), back_inserter(result), IsMemberOf(bad));
        return result;
}


At the risk of appearing archaic:


std::set<int> badItems;
std::vector<int> items;
std::vector<int> goodItems;
for ( std::vector<int>::iterator iter = items.begin();
      iter != items.end();
      ++iter)
{
   int& item = *iter;
   if ( badItems.find(item) == badItems.end() )
   {
      goodItems.push_back(item);
   }
}


std::remove_copy_if returns an iterator to the target collection. In this case, it would return good_items.end() (or something similar). good_items goes out of scope at the end of the method, so this would cause some memory errors. You should return good_items or pass in a new vector<int> by reference and then clear, resize, and populate it. This would get rid of the temporary variable.

I believe you have to define the custom functor because the method depends on the object bad_items which you couldn't specify without it getting hackey AFAIK.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜