开发者

Most efficient way of erasing/deleting multiple std::vector elements while retaining original order?

i have a std::vector<int> and a second container holding iterators or indexes (no keys, i want constant access to the element) to this vector for deletion purposes. Let's assume i have a vector of 1000 elements and want to erase 200 of them. The order of the non-removed elements should be the same after the deletion operations like before.

One more thing i missed in the first version of my question: the values are unique. They are identities.

How would you do that in a safe (regarding the stl rules) and efficient manner (the decision for a vector shall be final)?

Possibilities or Methods i thought about:

  • the erase-remove idiom (http://en.wikipedia.org/wiki/Erase-remove_idiom): originally for the deletion of elements which fulfill a condition (including linear search) but i think with ranges of size 1 this method could be used to with already given iterators and a dummy condition. Question: is the original order of elements kept and is it more performant than the last method?
  • loop over the indexes and erase the elements with the use of vector.erase(vector.begin()+index+offset) while keeping the indexes removed in a container for calculating the offset. This offset could be determined for every remove iteration with the use of a std::lower_bound n the container of already removed elements. The problem: A lot of binary_searches for getting the offset and a lot of move operations because of random-location-deletion.
  • At the moment I'm doing the following: get all the iterators for the elements to remove. Sort them in descending order according to the location in the vector and loop over them for the final deletion with vector.erase. Now I'm not invalidating any iterator and there are no vector rearrange-operations except for the deletion itself. The problem: a lot of sorting

So, how would you tackle this? Any new ideas? Any recommendations?

Thanks for your input.

Sascha

Edit / Update / Own results: I implemented the erase-remove idiom, which was also mentioned by KennyTM, 开发者_运维技巧with a predicate based on the lookup in a boost::dynamic_bitset and it's insanely fast. Furthermore i tried PigBen's move-and-truncate method (also mentioned by Steve Jessop) which is also accessing the bitset in it's while-loop. Both seem to be equally fast with my kind of data. I tried to delete 100 of 1000 Elements (unsigned ints), did this 100 deletes 1M times and there was no significant difference. Because i think the stl-based erase-remove idiom is kinda more "natural, i'm choosing this method (argument was also mentioned by KennyTM).


In <algorithm> there is a remove_if function which squeezes all values not removed to the front maintaining the order. This works if those 200 elements can be purely determined by the values, not index.

This is essentially the Erase-remove idiom you have linked to. remove_if is guaranteed to perform O(N) comparisons (and at most O(N) copyings), which would be more efficient than sorting (O(N log N)), although your last option doesn't actually require sorting if the indices are determined from values (just scan in the reversed direction while copying).

Nevertheless, using remove_if (if you can) is better than the other 2 options because the implementation has already been written for you, so there's less chance of logical error and conveys better what (not how) to do.


How about looping through the vector, and for each element that needs to be removed, copy the next element that doesn't need to be removed in to that position. Then when you get to the end, truncate it.

int last = 0;
for(int i=0; i<vec.size(); ++i, ++last)
{
   while(needs_to_be_removed(i))
      ++i;
   if(i >= vec.size()) break;

   vec[last] = vec[i];   
}

vec.resize(last);


First thing is, don't call erase more times than you have to, because for a vector it shuffles all the later elements down, giving the whole operation an Ω(n*m) worst case run time (n the size of the vector, m the size of the list of indexes to remove).

I think the first thing I'd try would be similar to your current code:

  • sort the indexes
  • create a new vector of size n - m
  • iterate over the original vector, copying indexes[0] elements, skipping an element, then copying indexes[1] - indexes[0] - 1 elements, skip an element, and so on.
  • swap the original vector with the new one.

You might be able to do the third step with remove_copy_if and a predicate which contains state (counting how many items it has copied and how far it is through the sorted list of indexes), but for extremely tedious and obscure reasons this isn't guaranteed to work (algorithm predicates with mutable state are problematic, it seems to be the consensus that the standard doesn't guarantee that the same copy of the predicate is used throughout the algorithm). So I really don't advise trying it, but it might help to bear in mind that what you're writing basically is a modified version of remove_copy_if.

You could avoid the second step using a back_inserter rather than presizing the vector, although you'd presumably still reserve the space in advance.

[Edit: come to think of it, why am I copying anything? Rather than implementing a modified remove_copy_if, implement a modified remove_if, and just copy to an earlier point in the vector. Then erase/resize at the end. I wouldn't worry about the O(m log m) sort of the indexes until proven to be a problem, because it's unlikely to be significantly slower than the Ω(m) operation to read all the values to be removed, and store them in some kind of container. Then, using this container in the predicate to remove_if may or may not be O(1). Sorting might turn out faster for plausible values of m.]


You can copy all elements of the vector to a list unless the index in your second container, and then back to a vector. Even with your algorithm of going from the end of the vector to the front, there's a lot of work going on behind the scenes in your vector.

Make your second container a map so it keeps the indeces sorted for you automatically.

edit:

To respond to the comment

The cost of maintaining a map is worst case the same as maintaining another structure (list or vector) and then sorting it. If you're already doing that, you might as well keep it as a map. It doesn't make sense to complain about the overhead of a map vs. the overhead of sorting a list.

As for the performance of my suggested algorithm, if m is the number of elements to be deleted, and n is the total number of elements then it results in O(n - m).

Of course, this is mostly just humoring your attempt to optimize with a vector.

1 - You shouldn't be using a vector if you want to do random access deletes. That's not what they're good at, use a list if at all possible. And since you seem to be much more interested in relative order rather than absolute index, I am wondering why a vector is needed at all. If you gave the entire problem, there's probably a common solution to let you use the most efficient data structure to solve it.

2 - Instead of maintaining a second data structure, mark elements that need to be deleted directly in their container. A trivial way is instead using a container< T > use a container< std::pair< T, char > > and use the char to keep track of the element status.

If you do 1 and 2, you remove all copying completely and get a much more efficient implementation.


Elements of what? Maybe I'm taking your post to seriously but if you have a vector of 1000 elements why not mark the ones that are not valid anymore and do away with erasing in the first place. Obviously I'm making an assumption here that your elements are not demanding a lot of memory.

I only bring this up because you seem to be concerned with speed. If the suggestions already given don't do the trick maybe this idea is worth a thought! In essence speed things up by not doing the operation in the first place.


If you have a (e.g. unordered) set of indices that you want to erase, you can use this:

template <typename Type>
void erase_indices(
        const std::unordered_set<size_t>& indices_to_erase,
        std::vector<Type>& vec) {
    std::vector<bool> erase_index(vec.size(), false);
    for (const size_t i: indices_to_erase) {
        erase_index[i] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

It is the fastest solution that came to my mind. You need C++11, though. Usage example to erase elements at index 2 and 5:

constexpr size_t num = 10u;
std::vector<int> vec(num);
std::iota(vec.begin(), vec.end(), 0);

std::unordered_set<size_t> indices_to_erase;
indices_to_erase.insert(2u);
indices_to_erase.insert(5u);

erase_indices(indices_to_erase, vec);

Before:

0 1 2 3 4 5 6 7 8 9

After:

0 1 3 4 6 7 8 9

Edit: If want to be more flexible regarding type of container that hold the indices to erase:

template <typename Type, typename Container>
void erase_indices(
        const Container& indices_to_erase,
        std::vector<Type>& vec) {
    typedef typename Container::value_type IndexType;
    static_assert(std::is_same<IndexType, std::size_t>::value,
        "Indices to be erased have to be of type std::size_t");
    std::vector<bool> erase_index(vec.size(), false);
    for (const IndexType idx_erase: indices_to_erase) {
        erase_index[idx_erase] = true;
    }
    std::vector<bool>::const_iterator it_to_erase = erase_index.cbegin();
    typename std::vector<Type>::iterator it_erase_from = std::remove_if(
        vec.begin(), vec.end(),
        [&it_to_erase](const Type&) -> bool {
          return *it_to_erase++ == true;
        }
    );
    vec.erase(it_erase_from, vec.end());
}

Now you can use any kind of container from the Containers Library to provide the indices to be erased as long as the value_type of that container is std::size_t. Usage remains the same.


I've written a function, based on Benjamin Lindley answer https://stackoverflow.com/a/4115582/2835054.

#include <iostream>
#include <algorithm>
#include <vector>

template <typename elementType, typename indexType>
void remove_multiple_elements_from_vector(std::vector<elementType> &vector,
std::vector<indexType> &indexes)
{
    // 1. indexType is any integer.
    // 2. elementType is any type.
    // 3. Indexes should be unique.
    // 4. The largest index inside indexes shouldn't be larger than
    //    the largetst index in the vector.
    // 5. Indexes should be sorted in ascending order
    //    (it is done inside function).
    std::sort(indexes.begin(), indexes.end());
    indexType currentIndexInIndexesVector = 0;
    indexType last = 0;
    for(indexType i=0; i<vector.size(); ++i, ++last)
    {
       while(indexes[currentIndexInIndexesVector] == i)
       {
          ++i;
          ++currentIndexInIndexesVector;
       }
       if(i >= vector.size()) break;

       vector[last] = vector[i];   
    }

    vector.resize(last);
}


int main()
{
    std::vector<int> vector = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    std::vector<int> indexes = {0, 10, 5};

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }    
    std::cout << "\n";

    remove_multiple_elements_from_vector<int, int>(vector, indexes);

    for (auto &vectorElement : vector)
    {
        std::cout << vectorElement << " ";
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜