std::set::insert, how bad can I hint?
I'm doing lots and lots of inserts of std::pair<int, int>
into a std::set
, and it's taking longer than I'd like. When I wrote the code I figured I'd look at using the hint iterator form of insert later if it turned out to be a bottleneck; well, now it's profiled and it is a bottleneck. So I want to use the iterator hint.
However, I'm not always going to know a good p开发者_StackOverflow中文版osition to insert my pairs. I typically insert them in batches (a batch in this case is on the order of 0.01% of the total input size, duplicates included) of increasing set-order, but when a batch is inserted, I do not know where the next one should start. How is the hint used? Does insert do something like a binary search from the suggested position? How bad would it be to use a bad hint, typically?
I suggest just reading what the compiler reads: the header file for #include <set>
. On my system (GNU libstdc++ 4.5.1) I can read the following self-explanatory text:
/**
* @brief Attempts to insert an element into the %set.
* @param position An iterator that serves as a hint as to where the
* element should be inserted.
* @param x Element to be inserted.
* @return An iterator that points to the element with key of @a x (may
* or may not be the element passed in).
*
* This function is not concerned about whether the insertion took place,
* and thus does not return a boolean like the single-argument insert()
* does. Note that the first parameter is only a hint and can
* potentially improve the performance of the insertion process. A bad
* hint would cause no gains in efficiency.
*
* For more on @a hinting, see:
* http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt07ch17.html
*
* Insertion requires logarithmic time (if the hint is not taken).
*/
iterator
insert(iterator __position, const value_type& __x)
{ return _M_t._M_insert_unique_(__position, __x); }
Takeaway:
- A bad hint would cause no gains in efficiency
- Insertion is
O(log n)
- You can read even more about insertion hints in the GNU libstdc++ manual.
If you check the file bits/stl_tree.h
(in GNU libstdc++), you'll find that the _M_insert_unique
member function with a hint argument looks one node to the left of the hint, then one node to the right, then defaults to calling the ordinary insert routine.
It calls key_compare
at least once (if the set is not empty) and at most three times. Going from one node to the next or previous is a matter of following a pointer since (IIRC) std::set
and friends are threaded trees.
So, how bad a bad hint is depends on the comparison routine, and on whether your std::set
's allocator packs nodes close in memory.
A hint is good if it is the right hint - the position to use for an insert. Works if you insert objects sequentially, for example.
If the hint is not correct, it has no effect and you get a non-hinted insert.
If you're building the set all at once before you use it, you can use a vector instead and sort it before you use it. You can use the binary_search
, lower_bound
, upper_bound
, and equal_range
algorithms on a sorted vector for fast lookups. You can also use merge
or inplace_merge
to combine sorted vectors, and set_difference
, set_intersection
, and set_union
to do other common set operations.
精彩评论