开发者

c++ container allowing you to sort items by when they where last accessed?

Does such a thing exist? or could anyone please recommend how I could implement such a container?

basically I have a std::map which uses a 64bit integer as its key and a custom datatype as the containing item.

I need to be able to periodically remove items that havent been accessed in a while in the most optimal way. does anyone have any suggestion开发者_如何转开发s for this?

cheers


Use a priority queue that places the least-recently-used (LRU) item at the head of the queue. When an item is accessed, remove it and re-insert it against the current timestamp. When you want to expire items, simply remove them from the head of the queue.

I should point out that you can't use the standard priority_queue, since that doesn't support random removal. You'll have to use the heap functions in conjunction with a vector.

I should also point out that updating an item on access will be expensive (O(N) to find the element to remove).

EDIT: Please disregard this answer. On rethinking, it isn't the best way to do it. (Also, see comments.)


Here is a sketch of how it might be done, using a list to store the most recently accessed items in order. The list is updated in constant time, so there is no significant overhead above the map access (unlike some other answers which require a linear search on each access). I've kept the interface very basic, and haven't tested it very thoroughly.

template <typename KEY, typename VALUE>
class Container
{
public:
    void Set(const KEY& key, const VALUE& value)
    {
        typename Map::iterator it = map.find(key);
        if (it == map.end())
        {
            list.push_front(it);
            it = map.insert(std::make_pair(key, std::make_pair(value, list.begin()))).first;
            list.front() = it;
        }
        else
        {
            it->second.first = value;
            Accessed(it);
        }
    }

    const VALUE* Get(const KEY& key)
    {
        typename Map::iterator it = map.find(key);
        if (it == map.end())
            return 0;

        Accessed(it);
        return &it->second.first;
    }

    void Expire(std::size_t new_size)
    {
        while (list.size() > new_size)
        {
            map.erase(list.back());
            list.pop_back();
        }
    }

private:
    // Needed to resolve the semicircular dependency on nested iterator types.
    struct MapIterator;

    typedef std::list<MapIterator> List;
    typedef std::map<KEY, std::pair<VALUE, typename List::iterator> > Map;

    struct MapIterator : Map::iterator
    {
        MapIterator(const typename Map::iterator& it) : Map::iterator(it) {}
    };

    void Accessed(typename Map::iterator it)
    {
        list.erase(it->second.second);
        list.push_front(it);
        it->second.second = list.begin();
    }

    Map map;
    List list;
};


One idea: maintain a std::deque which gets an iterator into your map element pushed to the front whenever accessing the map. You can then easily look at the deque to tell which elements have been used most recently.

Some C++ sketch (void of error checking, the point is to demonstrate that the deque is updated when accessing the map, and you can lateron trim the map).

class MyMap {
  typedef std::map<int64_t, void *> Map;
  Map m_map;
  std::deque<Map::iterator> m_recentlyUsedItems;

public:
  void *getItem( int64_t key ) {
    Map::iterator it = m_map.find( key );
    if ( it == m_map.end() ) {
      return 0;
    }
    m_recentlyUsedItems.push_front( it );
    return it->second;
  }

  void removeAllButMostRecentlyUsedItems( int n ) {
    std::deque<Map::iterator> it = m_recentlyUsedItems.begin();
    advance( it, n );
    std::deque<Map::iterator> it2 = it;
    for ( ; it2 != m_recentlyUsedItems.end(); ++it2 ) {
      m_map.erase( *it2 );
    }
    m_recentlyUsedItems.erase( it, m_recentlyUsedItems.end() );
  }
};


I'm going to propose a singularly different idea.

The problem of optimal is that it's difficult to get its sense. Especially: do you wish to cripple the retrieve operation in order to get a faster cleanup ? Typical cleanup are usually done during "down time" when speed isn't that important, on the other hand you might want snappy retrieves (for access in loops etc...)

There I would propose that before you try fancy constructs, you simply store the last accessed time along your item in the map. Then, cleanup simply consists in checking each item in the map and removing those you don't want any longer.


If you just need to know which elements were accessed so that you can delete them, then you probably could take a multi index map and store the last access value as alternative key.

If you want to use that idea to increase performance, you could implement your own container. The easiest approach would mean making a data structure that is known as auto-sorting list. Actually, this means that every access operation makes the accessed element of your list it's new head. In this case, elements that are accessed frequently would reside close to the beginning, resulting in a better search time.

Of course, there are variations. Automatically sorted lists aren't very efficient and there are many other similiar data structures that are actually better.


I have implemented a similar sounding type I called DynamicCache. Basically it stores the data in a list sorted by the creation date. This could easily be changed to the last accessed date. The purpose of my cache is to cache database items that don't change very often. It caches items for 5 minutes then removes them to be read again when they are next accessed.

The cache lookup uses a map that stores the key and an iterator to the list. The lookup uses the map to find the data in the list, then before the item is returned removes all the old items from the end of the list. If an item is not in the cache a factory is called to provided the data.

This approach must use a list to store the data as the iterators in the map must always be valid, if it used a deque the iterators could be invalidated after an insert. The list uses a struct to store the data, the key, the time it was created (not last accessed) and finally if the data exists.

struct Record 
{
    KeyT key;
    DataT data;
    time_t createTime;
    bool exists;
};

If your data is static and you want to preserve the most recently accessed then you could add an access time member to the struct, and move the item to the top of the list each time it is accessed.

Here is my code, it looks a little complicated but this is mainly caused by template parameters and a reader writer lock.

#include "BWThread/BWReadersWriterLock.h"

#include <list>
#include <map>
#include <ctime>
#include <memory>
#include <boost/scoped_ptr.hpp>

/**
 * This is a Generic Cache implementation.
 * 
 * To implement a cache using this class create a new class
 * derived from CacheFactory and implement the lookup method.
 * If the datasource supports updating implement update and
 * remove method.
 *
 * EG
 * typedef NameCache DynamicCache<int, BWString>;
 * NameCacheFactory : NameCache::Factory
 * {
 * public:
 *    virtual bool lookup(int, BWString *);
 * };
 *
 * NameCache cache(new NameCacheFactory, "<cache name>" );
 * 
 * --------------------------------------------------------
 * Implementation note:
 * This class uses a list as an efficient way to remove stale items from
 * the cache.  The map stores a key and an iterators to the data in the list.
 * The list and the map are updated together.
 */

template <class KeyT, class DataT>
class CacheFactory 
{
public:
    virtual ~CacheFactory() {}

    // Lookup the data for from the data source.
    // Return true if the data is found.
    virtual bool lookup(const KeyT & key, DataT * data) = 0;

    // Update or insert the data in the data source.
    // Return true if the data can be updated.
    // Returning false means the cache is not updated either.
    virtual bool update(const KeyT & key, const DataT & data) { return false; }

    // Remove the data in the data source.
    // Return true if the data can be deleted weather it exists or not.
    // Returning false means the cache is not updated either.
    virtual bool remove(const KeyT & key) { return false; }
};


template <class KeyT, class DataT>
class DynamicCache
{
public:
    typedef CacheFactory<KeyT, DataT> Factory;

    DynamicCache(Factory * f, const std::string & name, time_t t = (5 * 60)) : 
        factory(f), timeout(t), lastClean(std::time(0)), lock(name + " DynamicCache") {}

    /*
     * Lookup a key in the cache, the cached version is returned if it is 
     * present and the value is not old.  If the value is old or is not
     * present then use the factory to create it and insert the value in the
     * cache for future lookups.  If the factory cannot create it cache this
     * fact too so we will ignore future lookups.  Afterwards any entries in 
     * the cache longer than timeout are removed.
     *
     * This is the main method and entry point for the cache.  All locking is
     * performed inside the child methods.
     */
    bool lookup(const KeyT & key, DataT * data, time_t now = std::time(0)) 
    {
        bool found = false;
        FindStatus status = find(key, data, now);

        switch(status & EntryStatus) {
        case Found: 
            found = true;
            break;
        case Create:
            found = build(key, data, now);
            break;
        }
        if (status & CleanRequired) {
            cleanOldEntries(now);
        }
        return found;
    }

    bool update(const KeyT & key, const DataT & data, time_t now = std::time(0)) 
    { 
        if (factory->update(key, data))
        {
            Record record;
            record.key = key;
            record.createTime = now;
            record.data = data;
            record.exists = true;

            BWReadersWriterLock::WriteLockGuard guard(lock, __FILE__, __LINE__);

            updateEntry(key, record);
            return true;
        }
        return false;
    }

    bool remove(const KeyT & key, time_t now = std::time(0)) 
    { 
        if (factory->remove(key))
        {
            Record record;
            record.key = key;
            record.createTime = now;
            record.exists = false;

            BWReadersWriterLock::WriteLockGuard guard(lock, __FILE__, __LINE__);

            updateEntry(key, record);
            return true;
        }
        return false;
    }

    /**
     * Return the size of the cache (only really useful for unit testing).
     */
    size_t size() const
    {
        BWReadersWriterLock::ReadLockGuard guard(lock, __FILE__, __LINE__);

        return map.size();
    }

    Factory * getFactory() 
    {
        return factory.get();
    }
private:
    // Cache record
    struct Record 
    {
        KeyT key;
        DataT data;
        time_t createTime;
        bool exists;
    };

    // Find and Clean status
    // CleanRequired is part of this so that searching the cache and finding
    // stale items in the cache can be automic (use a single readlock).
    enum FindStatus { 
        None,
        Found, 
        Create, //Add
        NotExist, 
        EntryStatus=Found|Create|NotExist, 
        CleanRequired = 8 
    };

    typedef std::list<Record> List;
    typedef typename List::iterator Iterator;
    typedef std::map<KeyT, typename Iterator> Map;


    //
    // The following methods all use and require explicit locking.
    //

    FindStatus find(const KeyT & key, DataT * data, time_t now) 
    {
        BWReadersWriterLock::ReadLockGuard guard(lock, __FILE__, __LINE__);

        Iterator itr = getEntry(key);
        if (isValid(itr) && !isOld(itr, now)) {
            if (itr->exists) {
                *data = itr->data;
                return FindStatus(Found | cleanRequired(now));
            }
            else {
                return FindStatus(NotExist | cleanRequired(now));
            }
        }
        return FindStatus(Create | cleanRequired(now));
    }

    bool build(const KeyT & key, DataT * data, time_t now)
    {
        Record record;
        record.key = key;
        record.createTime = now;
        record.exists = factory->lookup(key, &record.data);

        BWReadersWriterLock::WriteLockGuard guard(lock, __FILE__, __LINE__);

        if (record.exists) {
            *data = record.data;
        }

        updateEntry(key, record);
        return record.exists;
    }

    void cleanOldEntries(time_t now) 
    {
        BWReadersWriterLock::WriteLockGuard guard(lock, __FILE__, __LINE__);

        lastClean = now;
        time_t old = now - timeout;

        typename List::reverse_iterator itr = list.rbegin();
        while(!list.empty() && list.back().createTime < old) {
            removeEntry(getEntry(list.back().key));
        }
    }

    //
    // The following methods don't use locking but require the calling
    // method to already have aquired a lock.
    //

    Iterator getEntry(const KeyT & key)
    {
        typename Map::const_iterator itr = map.find(key);
        if (itr != map.end()) {
            return map.find(key)->second;
        }
        return list.end();
    }

    bool updateEntry(const KeyT key, const Record & record)
    {
        Iterator itr = getEntry(key);
        if (isValid(itr)) {
            removeEntry(itr);
        }

        insertEntry(record);
        return record.exists;
    }

    bool isValid(Iterator itr) const
    {
       typename List::const_iterator constItr(itr);
        return constItr != list.end();
    }

    bool isOld(Iterator itr, time_t now) const
    {
        // isOld or time_t has wrapped
        return ((itr->createTime + timeout) < now) || (now < itr->createTime);
    }

    Iterator insertEntry(const Record & record) 
    {
        list.push_front(record);
        Iterator itr = list.begin();
        map.insert(typename Map::value_type(record.key, itr));
        return itr;
    }

    void removeEntry(Iterator itr) 
    {
        map.erase(itr->key);
        list.erase(itr);
    }

    FindStatus cleanRequired(time_t now) const
    {
        return (lastClean + timeout) < now ? CleanRequired : None;
    }

    List list;
    Map map;
    time_t timeout;
    time_t lastClean;
    boost::scoped_ptr<CacheFactory<KeyT, DataT> > factory;
    mutable BWReadersWriterLock lock;
};


You can also use linked_hash_map from MCT library. In fact, its documentation contains a recipe for this usecase.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜