Difference in performance between map and unordered_map in c++

2022-12-21 06:35 问答作者：

I have a simple requirement, i need a map of type . however i need fastest theoretically possible retrieval time.

i used both map and the new proposed unordered_map from tr1 i f开发者_高级运维ound that at least while parsing a file and creating the map, by inserting an element at at time.

map took only 2 minutes while unordered_map took 5 mins.

As i it is going to be part of a code to be executed on Hadoop cluster and will contain ~100 million entries, i need smallest possible retrieval time.

Also another helpful information: currently the data (keys) which is being inserted is range of integers from 1,2,... to ~10 million.

I can also impose user to specify max value and to use order as above, will that significantly effect my implementation? (i heard map is based on rb trees and inserting in increasing order leads to better performance (or worst?) )

here is the code

map<int,int> Label // this is being changed to unordered_map  
fstream LabelFile("Labels.txt");  


// Creating the map from the Label.txt  
if (LabelFile.is_open())  
{  
    while (! LabelFile.eof() )  
    {             
        getline (LabelFile,inputLine);  
        try  
        {  
            curnode=inputLine.substr(0,inputLine.find_first_of("\t"));  
            nodelabel=inputLine.substr(inputLine.find_first_of("\t")+1,inputLine.size()-1);  
            Label[atoi(curnode.c_str())]=atoi(nodelabel.c_str());  
        }  
        catch(char* strerr)  
        {  
            failed=true;  
            break;  
        }  
    }  
    LabelFile.close(); 
}

Tentative Solution: After review of comments and answers, i believe a Dynamic C++ array would be the best option, since the implementation will use dense keys. Thanks

Insertion for unordered_map should be O(1) and retrieval should be roughly O(1), (its essentially a hash-table).

Your timings as a result are way OFF, or there is something WRONG with your implementation or usage of unordered_map.

You need to provide some more information, and possibly how you are using the container.

As per section 6.3 of n1836 the complexities for insertion/retreival are given:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf

One issue you should consider is that your implementation may need to continually be rehashing the structure, as you say you have 100mil+ items. In that case when instantiating the container, if you have a rough idea about how many "unique" elements will be inserted into the container, you can pass that in as a parameter to the constructor and the container will be instantiated accordingly with a bucket-table of appropriate size.

The extra time loading the unordered_map is due to dynamic array resizing. The resizing schedule is to double the number of cells each when the table exceeds it's load factor. So from an empty table, expect O(lg n) copies of the entire data table. You can eliminate these extra copies by sizing the hash table upfront. Specifically

Label.reserve(expected_number_of_entries / Label.max_load_factor());

Dividing by the max_load_factor is to account for the empty cells that are necessary for the hash table to operate.

unordered_map (at least in most implementations) gives fast retrieval, but relatively poor insertion speed compared to map. A tree is generally at its best when the data is randomly ordered, and at its worst when the data is ordered (you constantly insert at one end of the tree, increasing the frequency of re-balancing).

Given that it's ~10 million total entries, you could just allocate a large enough array, and get really fast lookups -- assuming enough physical memory that it didn't cause thrashing, but that's not a huge amount of memory by modern standards.

Edit: yes, a vector is basically a dynamic array.

Edit2: The code you've added some some problems. Your while (! LabelFile.eof() ) is broken. You normally want to do something like while (LabelFile >> inputdata) instead. You're also reading the data somewhat inefficiently -- what you apparently expecting is two numbers separated by a tab. That being the case, I'd write the loop something like:

while (LabelFile >> node >> label)
    Label[node] = label;

继续阅读：data-structures stl tr1

Difference in performance between map and unordered_map in c++

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？