Best container to use for searching and sorting
I have a pure virtual class for storing log data. This class has two pieces of information: std::string id
(unique) and int64_t time
(duplicates allowed) with getId()
and getTime()
functions. As the log entries are created, they will go into a container and on application termination, the log messages are written to file.
As the program continues, I may want to update a log entry, so I will need to search for id
to find the correct entry to update. On shutdown, I want the results to be logged in time
order.
I thought of storing the objects in a std::map
where the id
as the key, the object as the value for easy searching and updating. On shutdown, create a std::multimap
or std::vector
to do the sort before writing. Is this the best approach? Or is there a better object that can support both needs?
(source: adrinael.net)
(original source: Liam Devine)
Since you're only going to need the entries sorted on time once, when the program exits, I think your proposed solution is good.
If you were going to need the entries sorted on both time and id several times boost::multi_index
would be a good candidate.
The solution would be to have a pair of maps. Keep track of both map<string, entry*>
and map<integer, entry*>
.
On update, then, you would only need to search by id, and edit the pointed object. When pulling out all entries from the time map, you'd have the modified entries. This assumes that id and time values for entries are immutable.
If the entries are generated in order of their time
values and the time
is not changed when you do the update, than you can simply:
- Add entries to
std::vector<entry*>
(orstd::list<entry*>
) as they are generated. - And also add them to
std::map<std::string, entry*>
(orstd::unordered_map<std::string, entry*>
).
The map
allows for fast searching during updates. When the time comes to dump the entries to file, simply traverse the std::vector
in-order.
(There are variations that could save you some heap allocations, such as: std::vector<entry>
and std::map<std::string, std::vector<entry>::size_type>
, where the size_type
is the index of the vector element.)
If, however, you need to update the time
as well, I don't think you'll be able to avoid using std::multimap
instead of the std::vector
.
-- EDIT ---
OK, so you don't have in-order time
, how about this:
Maintain just one map<string, entry>
and when the time comes to dump it into file, generate a vector< map<string, entry>::const_iterator >
from it and std::sort
it on time
(use the comparer parameter of std::sort
).
Considerations:
- You can also just use
vector<entry*>
for sorting - the idea is that you are not allocating any new entries, you are just "pointing" to entries in themap
. - Also note that you can can pre-reserve
vector
to match themap.size()
, so vector re-allocations are avoided. - If you don't need to actually sort by entry IDs, consider using
unordered_map
(instead of just plainmap
) for performance.
This is cheaper overall than maintaining the second map
, but adds a delay before dumping to file. Depending on your performance requirements, this may or may not be the optimal solution.
stl::map
will do fine. And if I remember right, std::map
is implemented with binary trees under the hood, so it should already be sorted, as long as you overload the operator<
operator. So to have the map sorted by the timestamp, override operator<
to compare the timestamps.
You can use vector for store entries. On creation you can assign to UID every entry, but you can use entry HANDLE to modify item. HANDLE can be entry pointer or array index. In this case you will get O(1) complexity on entry access. Also you don't need sorting before log output. array already sorted in order of entry creation. Better solution to create static array of entry structures with reasonable size. This will eliminate dynamic memory use.
Edit: Before output you can sort this array to get entries in order of time
精彩评论