How can I build a lookup table in C++?
I am a complete novice in C++. I am trying to read a file and build a lookup table (more like a hashtable just to check the existence of a string value). The file has about 300 thousand entries that I will use to build a lookup table. And after this, I will be performing some 1 million lookups on this. What is the most efficient way of doing this? Is it 开发者_如何学JAVAthe map (google's first result) or is there a better structure for this purpose?
Based on the scenario, you probably also want to look at Tries
What you need is TRIE data structure. The dictionary is implemented widely using this data structure. Moreover it has O(n) lookup time where n is the length of the string and occupies less space. Trie has the abilities to quickly search for, insert, and delete entries.
map
has log(n)
lookups, but you can achieve O(1)
with a hash table, as you suggested. It looks like STL implements one, called hash_map.
C++ std::map
is not a hash table, but you could use it for a lookup table if you wanted.
Its performance characteristics as guaranteed by the C++ standard are:
- O(log n) for searching for an element
- O(log n) for inserting a new element
- O(log n) for removing an element
There will definitely be memory overhead because the std::map
is generally implemented with trees (and quite possibly a red-black tree), and pointers will be kept for each node in the map.
For better performance characteristics, you might want to look into Google's Sparsehash
Try: http://en.wikipedia.org/wiki/Unordered_map_%28C%2B%2B%29
In general hash tables are good, but if you want "the most efficient way" you'll have to provide more details.
If you want to check just the existance of a string value set
is suffiecient as you don't have any key-value pairs. See here for documentation.
If your biggest concern is look up time (and it sounds like it is) strongly consider a hashmap. The amortized look up time is O(1) which is notably better than a regular map at O(log n).
If you have a very good hash function (no collision on your dataset) and you just need to check if entry exists or not, you try a bitset (say from http://bmagic.sourceforge.net/)
i believe it can reduce memory requirements and it's very fast.
精彩评论