How can I return a copy of a vector containing elements not in a set?
Suppose I have the following two data structures:
std::vector<int> all_items;
std::set<int> bad_items;
The all_items
vector contains all known items and the bad_items
vector contains a list of bad items. These two data structures are populated entirely independent of one another.
What's the proper way to write a method that will return a std::vector<int>
contain all elements of all_items
not in bad_items
?
Currently, I have a clunky solution that I think can be done more concisely. My understanding of STL function adapters is lacking. Hence the question. My current solution is:
struct is_item_bad {
std::set<int> const* bad_items;
bool operator() (int const i) const {
return bad_items.count(i) > 0;
}
};
std::vector<int> items() const {
is_item_bad iib = { &bad_items; };
std::vector<int> good_items(all_items.size());
std::remove_copy_if(all_items.begin(), all_items.end(),
good_items.begin(), is_item_bad);
return good_items;
}
Assume all_items
, bad_items
, is_item_bad
and items()
are all a part of some containing class. Is there a way to write them items()
getter such that:
- It doesn't need temporary variables in the method?
- It doesn't need the custom functor,
struct is_item_bad
?
I had hoped to just use the count
method on std::set
as a functor, but I haven't been able to divine the right way to express that w/ the remove_copy_if
algorithm.
EDIT: Fixed the logic error in items()
. The actual code didn't have the problem, it was a transcription error.
EDIT: I have accepted a solution that doesn't use std::set_difference
since 开发者_运维问答it is more general and will work even if the std::vector
isn't sorted. I chose to use the C++0x lambda expression syntax in my code. My final items()
method looks like this:
std::vector<int> items() const {
std::vector<int> good_items;
good_items.reserve(all_items.size());
std::remove_copy_if(all_items.begin(), all_items.end(),
std::back_inserter(good_items),
[&bad_items] (int const i) {
return bad_items.count(i) == 1;
});
}
On a vector of about 8 million items the above method runs in 3.1s. I bench marked the std::set_difference
approach and it ran in approximately 2.1s. Thanks to everyone who supplied great answers.
As jeffamaphone suggested, if you can sort any input vectors, you can use std::set_difference
which is efficient and less code:
#include <algorithm>
#include <set>
#include <vector>
std::vector<int>
get_good_items( std::vector<int> const & all_items,
std::set<int> const & bad_items )
{
std::vector<int> good_items;
// Assumes all_items is sorted.
std::set_difference( all_items.begin(),
all_items.end(),
bad_items.begin(),
bad_items.end(),
std::back_inserter( good_items ) );
return good_items;
}
Since your function is going to return a vector, you will have to make a new vector (i.e. copy elements) in any case. In which case, std::remove_copy_if
is fine, but you should use it correctly:
#include <iostream>
#include <vector>
#include <set>
#include <iterator>
#include <algorithm>
#include <functional>
std::vector<int> filter(const std::vector<int>& all, const std::set<int>& bad)
{
std::vector<int> result;
remove_copy_if(all.begin(), all.end(), back_inserter(result),
[&bad](int i){return bad.count(i)==1;});
return result;
}
int main()
{
std::vector<int> all_items = {4,5,2,3,4,8,7,56,4,2,2,2,3};
std::set<int> bad_items = {2,8,4};
std::vector<int> filtered_items = filter(all_items, bad_items);
copy(filtered_items.begin(), filtered_items.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
}
To do this in C++98, I guess you could use mem_fun_ref
and bind1st
to turn set::count into a functor in-line, but there are issues with that (which resulted in deprecation of bind1st in C++0x) which means depending on your compiler, you might end up using std::tr1::bind anyway:
remove_copy_if(all.begin(), all.end(), back_inserter(result),
bind(&std::set<int>::count, bad, std::tr1::placeholders::_1)); // or std::placeholders in C++0x
and in any case, an explicit function object would be more readable, I think:
struct IsMemberOf {
const std::set<int>& bad;
IsMemberOf(const std::set<int>& b) : bad(b) {}
bool operator()(int i) const { return bad.count(i)==1;}
};
std::vector<int> filter(const std::vector<int>& all, const std::set<int>& bad)
{
std::vector<int> result;
remove_copy_if(all.begin(), all.end(), back_inserter(result), IsMemberOf(bad));
return result;
}
At the risk of appearing archaic:
std::set<int> badItems;
std::vector<int> items;
std::vector<int> goodItems;
for ( std::vector<int>::iterator iter = items.begin();
iter != items.end();
++iter)
{
int& item = *iter;
if ( badItems.find(item) == badItems.end() )
{
goodItems.push_back(item);
}
}
std::remove_copy_if
returns an iterator to the target collection. In this case, it would return good_items.end()
(or something similar). good_items
goes out of scope at the end of the method, so this would cause some memory errors. You should return good_items
or pass in a new vector<int>
by reference and then clear
, resize
, and populate it. This would get rid of the temporary variable.
I believe you have to define the custom functor because the method depends on the object bad_items
which you couldn't specify without it getting hackey AFAIK.
精彩评论