How to use a hash_map with case insensitive unicode string for key?
I'm very new to STL, and pretty new to C++ in general. I'm trying to get the equivalent of a .NET Dictionary<string, value>(StringComparer.OrdinalIgnoreCase)
but in C++. This is roughly what I'm trying:
stdext::hash_map<LPCWSTR, SomeStruct> someMap;
someMap.insert(stdext::pair<LPCWSTR, SomeStruct>(L"a string", struct));
someMap.find(L"a string")
someMap.find(L"A STRING")
The trouble is, neither find operation usually works (it returns someMap.end()
). It seems to sometimes work, but most of the time it doesn't. I'm guessing that the hash function the hash_map is using is hashing the memory address of the string instead of the content of the string itself, and it's almost certainly not case insensitive.
How can I get a dictionary-like structure that uses ca开发者_运维技巧se-insensitive keys and can store my custom struct?
The hash_map documentation you link to indicates that you can supply your own traits class as a third template parameter. This must satisfy the same interface as hash_compare.
Scanning the docs, I think that what you have to do is this, which basically replaces the use of StringComparer.OrdinalIgnoreCase
you had in your Dictionary:
struct my_hash_compare {
const size_t bucket_size = 4;
const size_t min_buckets = 8;
size_t operator()(const LPCWSTR &Key) const {
// implement a case-insensitive hash function here,
// or find something in the Windows libraries.
}
bool operator()(const LPCWSTR &Key1, const LPCWSTR &Key2) const {
// implement a case-insensitive comparison function here
return _wcsicmp(Key1, Key2) < 0;
// or something like that. There's warnings about
// locale plastered all over this function's docs.
}
};
I'm worried though that the docs say that the comparison function has to be a total order, not a strict weak order as is usual for sorted containers in the C++ standard libraries. If MS really means a total order, then the hash_map might rely on it being consistent with operator==
. That is, they might require that if my_hash_compare()(a,b)
is false, and my_hash_compare()(b,a)
is false, then a == b
. Obviously that's not true for what I've written, in which case you're out of luck.
As an alternative, which in any case is probably more efficient, you could push all the keys to a common case before using them in the map. A case-insensitive comparison is more costly than a regular string comparison. There's some Unicode gotcha to do with that which I can never quite remember, though. Maybe you have to convert -> lowercase -> uppercase, instead of just -> uppercase, or something like that, in order to avoid some nasty cases in certain languages or with titlecase characters. Anyone?
Also as other people said, you might not really want LPCWSTR as your key. This will store pointers in the map, which means that anyone who inserts a string has to ensure that the data it points to remains valid as long as it's in the hash_map. It's often better in the long run for hash_map to keep a copy of the key string passed to insert
, in which case you should use wstring
as the key.
There was some great information given here. I gathered bits and pieces from the answers and put this one together:
#include "stdafx.h"
#include "atlbase.h"
#include <map>
#include <wchar.h>
typedef std::pair<std::wstring, int> MyPair;
struct key_comparer
{
bool operator()(std::wstring a, std::wstring b) const
{
return _wcsicmp(a.c_str(), b.c_str()) < 0;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
std::map<std::wstring, int, key_comparer> mymap;
mymap.insert(MyPair(L"GHI",3));
mymap.insert(MyPair(L"DEF",2));
mymap.insert(MyPair(L"ABC",1));
std::map<std::wstring, int, key_comparer>::iterator iter;
iter = mymap.find(L"def");
if (iter == mymap.end()) {
printf("No match.\n");
} else {
printf("match: %i\n", iter->second);
}
return 0;
}
If you use an std::map
instead of the non-standard hash_map
, you can set the comparison function to be used when doing the binary search:
// Function object for case insensitive comparison
struct case_insensitive_compare
{
case_insensitive_compare() {}
// Function objects overloader operator()
// When used as a comparer, it should function as operator<(a,b)
bool operator()(const std::string& a, const std::string& b) const
{
return to_lower(a) < to_lower(b);
}
std::string to_lower(const std::string& a) const
{
std::string s(a);
std::for_each(s.begin(), s.end(), char_to_lower);
return s;
}
void char_to_lower(char& c) const
{
if (c >= 'A' && c <= 'Z')
c += ('a' - 'A');
}
};
// ...
std::map<std::string, std::string, case_insensitive_compare> someMap;
someMap["foo"] = "Hello, world!";
std::cout << someMap["FOO"] << endl; // Hello, world!
LPCWSTR is a pointer to a null-terminated array of unicode characters and probably not what you want in this case. Use the wstring
specialization of basic_string
instead.
For case-insensitivity, you would need to convert the keys to all upper case or all lower case before you insert and search. At least I don't think you can do it any other way.
精彩评论