When to choose std::vector over std::map for key-value data?

2022-12-28 03:30 问答作者：

Considering the positive effect of caching and data locality when searching in pri开发者_如何转开发mary memory, I tend to use std::vector<> with std::pair<>-like key-value items and perform linear searches for both, if I know that the total amount of key-value items will never be "too large" to severely impact performance.

Lately I've been in lots of situations where I know beforehand that I will have huge amounts of key-value items and have therefore opted for std::map<> from the beginning.

I'd like to know how you make your decisions for the proper container in situations like the ones described above.

Do you

always use std::vector<> (or similar)?
always use std::map<> (or similar)?
have a gut feeling for where in the item-count range one is preferable over the other?
something entirely different?

Thanks!

I only rarely use std::vector with a linear search (except in conjunction with binary searching as described below). I suppose for a small enough amount of data it would be better, but with that little data it's unlikely that anything is going to provide a huge advantage.

Depending on usage pattern, a binary search on an std::vector can make sense though. A std::map works well when you need to update the data regularly during use. In quite a few cases, however, you load up some data and then you use the data -- but after you've loaded the data, it mostly remains static (i.e., it changes very little, if at all).

In this case, it can make a lot of sense to load the data into a vector, sort it if necessary, and then do binary searches on the data (e.g. std::lower_bound, std::equal_range). This gives pretty much the best of both worlds -- low-complexity binary searches and good cache usage from high locality of reference (i.e., the vector is contiguous, as opposed to the linked structure of a std::map). The shortcoming, of course, is that insertions and deletions are slow -- but this is one time I have used your original idea -- store newly inserted data separately until it reaches some limit, and only then sort it in with the rest of the data, so a single search consists of a binary search of the main body of the data, followed by a linear search of the (small amount) of newly inserted data.

I would never make the choice solely on (possibly bogus) "efficiency" grounds, but always on what I am actually going to do with the container. Do I want to store duplicates? Is insertion order important? Will I sometimes want to search for the value not the key? Those kind of things.

Have you considered using sorted data structures? They tend to offer logarithmic searches and inserts - a reasonable trade-off. Personally I don't have any hard and fast rules other than liking maps for the ability to key on a human-readable/understandable value.

Of course there's plenty of discussion as well on the efficiency of maps vs. lists/vectors (sorted and unsorted) - if your key is a string that's 10,000 characters, it can take longer to do a string compare than to search through a list of just a few items, so you want to make sure that you can efficiently compare keys as well.

I almost always prefer to use map (or unordered_map, when a hash container makes more sense) vs. a vector.

That being said, I think your reasoning is backwards. I would tend to use a vector only when there are huge amounts of data, since a vector will be a smaller memory footprint.

With the right kinds of datasets, you can load a vector and then sort it and binary_search it with a smaller footprint and similar performance characteristics to a map, especially if the dataset is stable after load.

Why are you not taking unordered_map into account?

When to choose std::vector over std::map for key-value data?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？