开发者

How to delete items from a std::vector given a list of indices

I have a vector of items items, and a vector of indices that should be deleted from items:

std::vector<T> items;
std::vector<size_t> indicesToDelete;

items.push_back(a);
items.push_back(b);
items.push_back(c);
items.push_back(d);
items.push_back(e);

indicesToDelete.push_back(3);
indicesToDelete.push_back(0);
indicesToDelete.push_back(1);

// given these 2 data structures, I want to remove items so it contains
// only c and e (deleting indices 3, 0, and 1)
// ???

What's the best way to perform the deletion, knowing that with each deletion, it affects all other indices in indicesToDelete?

A couple ideas would be to:

  1. Copy items to a new vector one item at a time, skipping if the index is in indicesToDelete
  2. Iterate items and for each deletion, decrement all items in indicesToDelete which have a greater index.
  3. Sort indicesToDelete first, then iterate indicesToDelete, and for each deletion increment an indexCorrection which gets subtracted from subsequent indices.

All seem like I'm over-thinking such a seemingly trivial task. Any better ideas?


Edit Here is the solution, basically a variation of #1 but using iterators to define blocks to copy to the result.

template<typename T>
inline std::vector<T> erase_indices(const std::vector<T>& data, std::vector<size_t>& indicesToDelete/* can't assume copy elision, don't pass-by-value */)
{开发者_Python百科
    if(indicesToDelete.empty())
        return data;

    std::vector<T> ret;
    ret.reserve(data.size() - indicesToDelete.size());

    std::sort(indicesToDelete.begin(), indicesToDelete.end());

    // new we can assume there is at least 1 element to delete. copy blocks at a time.
    std::vector<T>::const_iterator itBlockBegin = data.begin();
    for(std::vector<size_t>::const_iterator it = indicesToDelete.begin(); it != indicesToDelete.end(); ++ it)
    {
        std::vector<T>::const_iterator itBlockEnd = data.begin() + *it;
        if(itBlockBegin != itBlockEnd)
        {
            std::copy(itBlockBegin, itBlockEnd, std::back_inserter(ret));
        }
        itBlockBegin = itBlockEnd + 1;
    }

    // copy last block.
    if(itBlockBegin != data.end())
    {
        std::copy(itBlockBegin, data.end(), std::back_inserter(ret));
    }

    return ret;
}


I would go for 1/3, that is: order the indices vector, create two iterators into the data vector, one for reading and one for writting. Initialize the writing iterator to the first element to be removed, and the reading iterator to one beyond that one. Then in each step of the loop increment the iterators to the next value (writing) and next value not to be skipped (reading) and copy/move the elements. At the end of the loop call erase to discard the elements beyond the last written to position.

BTW, this is the approach implemented in the remove/remove_if algorithms of the STL with the difference that you maintain the condition in a separate ordered vector.


std::sort() the indicesToDelete in descending order and then delete from the items in a normal for loop. No need to adjust indices then.


It might even be option 4:

If you are deleting a few items from a large number, and know that there will never be a high density of deleted items:

Replace each of the items at indices which should be deleted with 'tombstone' values, indicating that there is nothing valid at those indices, and make sure that whenever you access an item, you check for a tombstone.


It depends on the numbers you are deleting.

If you are deleting many items, it may make sense to copy the items that are not deleted to a new vector and then replace the old vector with the new vector (after sorting the indicesToDelete). That way, you will avoid compressing the vector after each delete, which is an O(n) operation, possibly making the entire process O(n^2).

If you are deleting a few items, perhaps do the deletion in reverse index order (assuming the indices are sorted), then you do not need to adjust them as items get deleted.


Since the discussion has somewhat transformed into a performance related question, I've written up the following code. It uses remove_if and vector::erase, which should move the elements a minimal number of times. There's a bit of overhead, but for large cases, this should be good.

However, if you don't care about the relative order of elements, then this will not be all that fast.

#include <algorithm>
#include <iostream>
#include <string>
#include <vector>
#include <set>

using std::vector;
using std::string;
using std::remove_if;
using std::cout;
using std::endl;
using std::set;

struct predicate {
    public:
        predicate(const vector<string>::iterator & begin, const vector<size_t> & indices) {
            m_begin = begin;
            m_indices.insert(indices.begin(), indices.end());
        }

        bool operator()(string & value) {
            const int index = distance(&m_begin[0], &value);
            set<size_t>::iterator target = m_indices.find(index);
            return target != m_indices.end();
        }

    private:
        vector<string>::iterator m_begin;
        set<size_t> m_indices;
};

int main() {
    vector<string> items;
    items.push_back("zeroth");
    items.push_back("first");
    items.push_back("second");
    items.push_back("third");
    items.push_back("fourth");
    items.push_back("fifth");

    vector<size_t> indicesToDelete;
    indicesToDelete.push_back(3);
    indicesToDelete.push_back(0);
    indicesToDelete.push_back(1);

    vector<string>::iterator pos = remove_if(items.begin(), items.end(), predicate(items.begin(), indicesToDelete));
    items.erase(pos, items.end());

    for (int i=0; i< items.size(); ++i)
        cout << items[i] << endl;
}

The output for this would be:

second
fourth
fifth

There is a bit of a performance overhead that can still be reduced. In remove_if (atleast on gcc), the predicate is copied by value for each element in the vector. This means that we're possibly doing the copy constructor on the set m_indices each time. If the compiler is not able to get rid of this, then I would recommend passing the indices in as a set, and storing it as a const reference.

We could do that as follows:

struct predicate {
    public:
        predicate(const vector<string>::iterator & begin, const set<size_t> & indices) : m_begin(begin), m_indices(indices) {
        }

        bool operator()(string & value) {
            const int index = distance(&m_begin[0], &value);
            set<size_t>::iterator target = m_indices.find(index);
            return target != m_indices.end();
        }

    private:
        const vector<string>::iterator & m_begin;
        const set<size_t> & m_indices;
};

int main() {
    vector<string> items;
    items.push_back("zeroth");
    items.push_back("first");
    items.push_back("second");
    items.push_back("third");
    items.push_back("fourth");
    items.push_back("fifth");

    set<size_t> indicesToDelete;
    indicesToDelete.insert(3);
    indicesToDelete.insert(0);
    indicesToDelete.insert(1);

    vector<string>::iterator pos = remove_if(items.begin(), items.end(), predicate(items.begin(), indicesToDelete));
    items.erase(pos, items.end());

    for (int i=0; i< items.size(); ++i)
        cout << items[i] << endl;
}


Basically the key to the problem is remembering that if you delete the object at index i, and don't use a tombstone placeholder, then the vector must make a copy of all of the objects after i. This applies to every possibility you suggested except for #1. Copying to a new list makes one copy no matter how many you delete, making it by far the fastest answer.
And as David Rodríguez said, sorting the list of indexes to be deleted allows for some minor optimizations, but it may only worth it if you're deleting more than 10-20 (please profile first).


Here is my solution for this problem which keeps the order of the original "items":

  1. create a "vector mask" and initialize (fill) it with "false" values.
  2. change the values of mask to "true" for all the indices you want to remove.
  3. loop over all members of "mask" and erase from both vectors "items" and "mask" the elements with "true" values.

Here is the code sample:

#include <iostream>
#include <vector>

using namespace std;

int main()
{
    vector<unsigned int> items(12);
    vector<unsigned int> indicesToDelete(3);
    indicesToDelete[0] = 3;
    indicesToDelete[1] = 0;
    indicesToDelete[2] = 1;
    for(int i=0; i<12; i++) items[i] = i;

    for(int i=0; i<items.size(); i++)
      cout << "items[" << i << "] = " << items[i] << endl;

    // removing indeces
    vector<bool> mask(items.size());
    vector<bool>::iterator mask_it;
    vector<unsigned int>::iterator items_it;
    for(size_t i = 0; i < mask.size(); i++)
      mask[i] = false;
    for(size_t i = 0; i < indicesToDelete.size(); i++)
      mask[indicesToDelete[i]] = true;        

    mask_it = mask.begin();
    items_it = items.begin();
    while(mask_it != mask.end()){
      if(*mask_it){
        items_it = items.erase(items_it);
        mask_it = mask.erase(mask_it);
      }
      else{
        mask_it++;
        items_it++;
      }
    }

    for(int i=0; i<items.size(); i++)
      cout << "items[" << i << "] = " << items[i] << endl;

    return 0;
}

This is not a fast implementation for using with large data sets. The method "erase()" takes time to rearrange the vector after eliminating the element.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜