String literal to basic_string<unsigned char>
When it comes to internationalization & Unicode, I'm an idiot American programmer. Here's the deal.
#include <string>
using namespace std;
typedef basic_string<unsigned char> ustring;
int main()
{
static const ustring my_str = "Hello, UTF-8!"; // <== error here
return 0;
}
This emits a not-unexpected complaint:
cannot convert from 'const char [14]开发者_开发百科' to 'std::basic_string<_Elem>'
Maybe I've had the wrong portion of coffee today. How do I fix this? Can I keep the basic structure:
ustring something = {insert magic incantation here};
?
Narrow string literals are defined to be const char
and there aren't unsigned string literals[1], so you'll have to cast:
ustring s = reinterpret_cast<const unsigned char*>("Hello, UTF-8");
Of course you can put that long thing into an inline function:
inline const unsigned char *uc_str(const char *s){
return reinterpret_cast<const unsigned char*>(s);
}
ustring s = uc_str("Hello, UTF-8");
Or you can just use basic_string<char>
and get away with it 99.9% of the time you're dealing with UTF-8.
[1] Unless char
is unsigned, but whether it is or not is implementation-defined, blah, blah.
Using different character types for a different encodings has the advantages that the compiler barks at you when you mess them up. The downside is, you have to manually convert.
A few helper functions to the rescue:
inline ustring convert(const std::string& sys_enc) {
return ustring( sys_enc.begin(), sys_enc.end() );
}
template< std::size_t N >
inline ustring convert(const char (&array)[N]) {
return ustring( array, array+N );
}
inline ustring convert(const char* pstr) {
return ustring( reinterpret_cast<const ustring::value_type*>(pstr) );
}
Of course, all these fail silently and fatally when the string to convert contains anything other than ASCII.
精彩评论