How to achieve better efficiency re-inserting into sets in C++

2023-01-04 10:20 问答作者：

I need to modify an object that has already been inserted into a set. This isn't trivial because the iterator in the pair returned from an insertion of a single object is a const iterator and does not allow modifications. So, my plan was that if an insert failed I could copy that object into a temporary variable, erase it from the set, modify it locally and then insert my modified version.

insertResult = mySet.insert(newPep);
    if( insertResult.second == false )
   开发者_运维技巧     modifySet(insertResult.first, newPep);

void modifySet(set<Peptide>::iterator someIter, Peptide::Peptide newPep) {
    Peptide tempPep = (*someIter);
    someSet.erase(someIter);
    // Modify tempPep - this does not modify the key
    someSet.insert(tempPep);

}

This works, but I want to make my insert more efficient. I tried making another iterator and setting it equal to someIter in modifySet. Then after deleting someIter I would still have an iterator to that location in the set and I could use that as the insertion location.

void modifySet(set<Peptide>::iterator someIter, Peptide::Peptide newPep) {
    Peptide tempPep = (*someIter);
    anotherIter = someIter;
    someSet.erase(someIter);
    // Modify tempPep - this does not modify the key
    someSet.insert(anotherIter, tempPep);

}

However, this results in a seg fault. I am hoping that someone can tell me why this insertion fails or suggest another way to modify an object that has already been inserted into a set.

The full source code can be viewed at github.

I agree with Peter that a map is probably a better model of what you are doing, specifically something like map<pep_key, Peptide::Peptide>, would let you do something like:

insertResult = myMap.insert(std::make_pair(newPep.keyField(), newPep));
if( insertResult.second == false )
    insertResult.first->second = newPep;

To answer your question, the insert segfaults because erase invalidates an iterator, so inserting with it (or a copy of it) is analogous to dereferencing an invalid pointer. The only way I see to do what you want is with a const_cast

insertResult = mySet.insert(newPep);
if( insertResult.second == false )
    const_cast<Peptide::Peptide&>(*(insertResult.first)) = newPep;

the const_cast approach looks like it will work for what you are doing, but is generally a bad idea.

I hope it isn't bad form to answer my own question, but I would like it to be here in case someone else ever has this problem. The answer of why my attempt seg faulted was given my academicRobot, but here is the solution to make this work with a set. While I do appreciate the other answers and plan to learn about maps, this question was about efficiently re-inserting into a set.

void modifySet(set<Peptide>::iterator someIter, Peptide::Peptide newPep) {
    if( someIter == someSet.begin() ) {
        Peptide tempPep = (*someIter);
        someSet.erase(someIter);
        // Modify tempPep - this does not modify the key
        someSet.insert(tempPep);   
    }
    else {
        Peptide tempPep = (*someIter);
        anotherIter = someIter;
        --anotherIter;
        someSet.erase(someIter);
        // Modify tempPep - this does not modify the key
        someSet.insert(anotherIter, tempPep); 
     }
}

In my program this change dropped my run time by about 15%, from 32 seconds down to 27 seconds. My larger data set is currently running and I have my fingers crossed that the 15% improvement scales.

std::set::insert returns a pair<iterator, bool> as far as I know. In any case, directly modifying an element in any sort of set is risky. What if your modification causes the item to compare equal to another existing item? What if it changes the item's position in the total order of items in the set? Depending on the implementation, this will cause undefined behaviour.

If the item's key remains the same and only its properties change, then I think what you really want is a map or an unordered_map instead of a set.

As you realized set are a bit messy to deal with because you have no way to indicate which part of the object should be considered for the key and which part you can modify safely.

The usual answer is to use a map or an unordered_map (if you have access to C++0x) and cut your object in two halves: the key and the satellite data.

Beware of the typical answer: std::map<key_type, Peptide>, while it seems easy it means you need to guarantee that the key part of the Peptide object always match the key it's associated with, the compiler won't help.

So you have 2 alternatives:

Cut Peptide in two: Peptide::Key and Peptide::Data, then you can use the map safely.
Don't provide any method to alter the part of Peptide which defines the key, then you can use the typical answer.

Finally, note that there are two ways to insert in a map-like object.

insert: insert but fails if the value already exists
operator[]: insert or update (which requires creating an empty object)

So, a solution would be:

class Peptide
{
public:
  Peptide(int const id): mId(id) {}

  int GetId() const;

  void setWeight(float w);
  void setLength(float l);

private:
  int const mId;
  float mWeight;
  float mLength;
};

typedef std::unordered_map<int, Peptide> peptide_map;

Note that in case of update, it means creating a new object (default constructor) and then assigning to it. This is not possible here, because assignment means potentially changing the key part of the object.

std::map will make your life a lot easier and I wouldn't be surprised if it outperforms std::set for this particular case. The storage of the key might seem redundant but can be trivially cheap (ex: pointer to immutable data in Peptide with your own comparison predicate to compare the pointee correctly). With that you don't have to fuss about with the constness of the value associated with a key.

If you can change Peptide's implementation, you can avoid redundancy completely by making Peptide into two separate classes: one for the key part and one for the value associated with the key.

继续阅读：insert stl

How to achieve better efficiency re-inserting into sets in C++

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？