开发者

Fastest container or algorithm for unique reusable ids in C++

I have a need for unique reusable ids. The user can choose his own ids or he can ask for a free one. The API is basically

class IdManager {
public:
  int AllocateId();          // Allocates an id
  void FreeId(int id);       // Frees an id so it can be used again
  bool MarkAsUsed(int id);   // Let's the user register an id. 
                             // returns false if the id w开发者_开发知识库as already used.
  bool IsUsed(int id);       // Returns true if id is used.
};

Assume ids happen to start at 1 and progress, 2, 3, etc. This is not a requirement, just to help illustrate.

IdManager mgr;
mgr.MarkAsUsed(3);
printf ("%d\n", mgr.AllocateId());
printf ("%d\n", mgr.AllocateId());
printf ("%d\n", mgr.AllocateId());

Would print

1
2
4

Because id 3 has already been declared used.

What's the best container / algorithm to both remember which ids are used AND find a free id?

If you want to know the a specific use case, OpenGL's glGenTextures, glBindTexture and glDeleteTextures are equivalent to AllocateId, MarkAsUsed and FreeId


My idea is to use std::set and Boost.interval so IdManager will hold a set of non-overlapping intervals of free IDs. AllocateId() is very simple and very quick and just returns the left boundary of the first free interval. Other two methods are slightly more difficult because it might be necessary to split an existing interval or to merge two adjacent intervals. However they are also quite fast.

So this is an illustration of the idea of using intervals:

IdManager mgr;    // Now there is one interval of free IDs:  [1..MAX_INT]
mgr.MarkAsUsed(3);// Now there are two interval of free IDs: [1..2], [4..MAX_INT]
mgr.AllocateId(); // two intervals:                          [2..2], [4..MAX_INT]
mgr.AllocateId(); // Now there is one interval:              [4..MAX_INT]
mgr.AllocateId(); // Now there is one interval:              [5..MAX_INT]

This is code itself:

#include <boost/numeric/interval.hpp>
#include <limits>
#include <set>
#include <iostream>


class id_interval 
{
public:
    id_interval(int ll, int uu) : value_(ll,uu)  {}
    bool operator < (const id_interval& ) const;
    int left() const { return value_.lower(); }
    int right() const {  return value_.upper(); }
private:
    boost::numeric::interval<int> value_;
};

class IdManager {
public:
    IdManager();
    int AllocateId();          // Allocates an id
    void FreeId(int id);       // Frees an id so it can be used again
    bool MarkAsUsed(int id);   // Let's the user register an id. 
private: 
    typedef std::set<id_interval> id_intervals_t;
    id_intervals_t free_;
};

IdManager::IdManager()
{
    free_.insert(id_interval(1, std::numeric_limits<int>::max()));
}

int IdManager::AllocateId()
{
    id_interval first = *(free_.begin());
    int free_id = first.left();
    free_.erase(free_.begin());
    if (first.left() + 1 <= first.right()) {
        free_.insert(id_interval(first.left() + 1 , first.right()));
    }
    return free_id;
}

bool IdManager::MarkAsUsed(int id)
{
    id_intervals_t::iterator it = free_.find(id_interval(id,id));
    if (it == free_.end()) {
        return false;
    } else {
        id_interval free_interval = *(it);
        free_.erase (it);
        if (free_interval.left() < id) {
            free_.insert(id_interval(free_interval.left(), id-1));
        }
        if (id +1 <= free_interval.right() ) {
            free_.insert(id_interval(id+1, free_interval.right()));
        }
        return true;
    }
}

void IdManager::FreeId(int id)
{
    id_intervals_t::iterator it = free_.find(id_interval(id,id));
    if (it != free_.end()  && it->left() <= id && it->right() > id) {
        return ;
    }
    it = free_.upper_bound(id_interval(id,id));
    if (it == free_.end()) {
        return ;
    } else {
        id_interval free_interval = *(it);

        if (id + 1 != free_interval.left()) {
            free_.insert(id_interval(id, id));
        } else {
            if (it != free_.begin()) {
                id_intervals_t::iterator it_2 = it;
                --it_2;
                if (it_2->right() + 1 == id ) {
                    id_interval free_interval_2 = *(it_2);
                    free_.erase(it);
                    free_.erase(it_2);
                    free_.insert(
                        id_interval(free_interval_2.left(), 
                                    free_interval.right()));
                } else {
                    free_.erase(it);
                    free_.insert(id_interval(id, free_interval.right()));
                }
            } else {
                    free_.erase(it);
                    free_.insert(id_interval(id, free_interval.right()));
            }
        }
    }
}

bool id_interval::operator < (const id_interval& s) const
{
    return 
      (value_.lower() < s.value_.lower()) && 
      (value_.upper() < s.value_.lower());
}


int main()
{
    IdManager mgr;

    mgr.MarkAsUsed(3);
    printf ("%d\n", mgr.AllocateId());
    printf ("%d\n", mgr.AllocateId());
    printf ("%d\n", mgr.AllocateId());

    return 0;
}


It would be good to know how many ids you're supposed to keep track of. If there's only a hundred or so, a simple set would do, with linear traversal to get a new id. If it's more like a few thousands, then of course the linear traversal will become a performance killer, especially considering the cache unfriendliness of the set.

Personally, I would go for the following:

  • set, which helps keeping track of the ids easily O(log N)
  • proposing the new id as the current maximum + 1... O(1)

If you don't allocate (in the lifetime of the application) more than max<int>() ids, it should be fine, otherwise... use a larger type (make it unsigned, use a long or long long) that's the easiest to begin with.

And if it does not suffice, leave me a comment and I'll edit and search for more complicated solutions. But the more complicated the book-keeping, the longer it'll take to execute in practice and the higher the chances of making a mistake.


But I don't think you have to guarantee the id must starts from 1. You can just make sure the available id must be larger than all allocated ids.

Like if the 3 is registered first, then the next available id can just be 4. I don't think it is necessary to use 1.


I'm assuming that you want to be able to use all available values for the Id type and that you want to reuse freed Ids? I'm also assuming that you'll lock the collection if you're using it from more than one thread...

I'd create a class with a set to store the allocated ids, a list to store the free ids and a max allocated value to prevent me having to preload the free id list with every available id.

So you start off with an empty set of allocated ids and empty list of free ids and the max allocated as 0. You allocate, take the head of the free list if there is one, else take max, check it's not in your set of allocated ids as it might be if someone reserved it, if it is, increment max and try again, if not add it to the set and return it.

When you free an id you simply check it's in your set and if so push it on your free list.

To reserve an id you simply check the set and if not present add it.

This recycles ids quickly, which may or may not be good for you, that is, allocate(), free(), allocate() will give you the same id back if no other threads are accessing the collection.


Compressed vector. But I don't think any container would make noticeable difference.


Normally, i'd say stick to an simple implementation until you have an idea of which methods are used most. Premature tuning might prove wrong. Use the simple implementation, and log its use, then you can optimize from the functions that are used the most. No use in optimizing for quick removal or quick allocation if you only need a couple of hundred ids and a simple vector would be enough.


Similar to skwllsp, I'd keep track of the ranges that have not been allocated, but my methods are slightly different. The base container would be a map, with the key being the upper bound of the range and the value being the lower bound.

IdManager::IdManager()
{
    m_map.insert(std::make_pair(std::numeric_limits<int>::max(), 1);
}

int IdManager::AllocateId()
{
    assert(!m_map.empty());
    MyMap::iterator p = m_map.begin();
    int id = p->second;
    ++p->second;
    if (p->second > p->first)
        m_map.erase(p);
    return id;
}

void IdManager::FreeId(int id)
{
    // I'll fill this in later
}

bool IdManager::MarkAsUsed(int id)
{
    MyMap::iterator p = m_map.lower_bound(id);

    // return false if the ID is already allocated
    if (p == m_map.end() || id < p->second || id > p->first)))
        return false;

    // first thunderstorm of the season, I'll leave this for now before the power glitches
}

bool IdManager::IsUsed(int id)
{
    MyMap::iterator p = m_map.lower_bound(id);
    return (p != m_map.end() && id >= p->second && id <= p->first);
}


So a friend pointed out that in this case a hash might be better. Most OpenGL programs don't use more than a few thousand ids so a hash with say 4096 slots is almost guaranteed to have only 1 or 2 entries per slot. There is some degenerate case where lots of ids might go in 1 slot but that's seriously unlikely. Using a hash would make AllocateID much slower but a set could be used for that. Allocating being slower is less important than InUse being fast for my use case.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜