Unknown meta-character in C/C++ string literal?
I created a new project with the foll开发者_运维百科owing code segment:
char* strange = "(Strange??)";
cout << strange << endl;
resulting in the following output:
(Strange]
Thus translating '??)' -> ']'
Debugging it shows that my char* string literal is actually that value and it's not a stream translation. This is obviously not a meta-character sequence I've ever seen. Some sort of Unicode or wide char sequence perhaps? I don't think so however... I've tried disabling all related project settings to no avail.
Anyone have an explanation?
- search : 'question mark, question mark, close brace' c c++ string literal
What you're seeing is called a trigraph.
In written language by grown-ups, one question mark is sufficient for any situation. Don't use more than one at a time and you'll never see this again.
GCC ignores trigraphs by default because hardly anyone uses them intentionally. Enable them with the -trigraph
option, or tell the compiler to warning you about them with the -Wtrigraphs
option.
Visual C++ 2010 also disables them by default and offers /Zc:trigraphs
to enable them. I can't find anything about ways to enable or disable them in prior versions.
Easy way to avoid the trigraph surprise: split a "??" string literal in two:
char* strange = "(Strange??)";
char* strange2 = "(Strange?" "?)";
/* ^^^ no punctuation */
Edit
gcc has an option to warn about trigraphs: -Wtrigraphs
(enabled with -Wall
also)
end edit
Quotes from the Standard
5.2.1.1 Trigraph sequences 1 Before any other processing takes place, each occurrence of one of the following sequences of three characters (called trigraph sequences13)) is replaced with the corresponding single character. ??= # ??) ] ??! | ??( [ ??' ^ ??> } ??/ \ ??< { ??- ~ No other trigraph sequences exist. Each ? that does not begin one of the trigraphs listed above is not changed.
5.1.1.2 Translation phases 1 The precedence among the syntax rules of translation is specified by the following phases. 1. Physical source file multibyte characters are mapped, in an implementation-defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations.
It's a Trigraph!
??) is a trigraph.
That's trigraph support. You can prevent trigraph interpretation by escaping any of the characters:
char* strange = "(Strange?\?)";
It's a trigraph.
Trigraphs are the reason. The talk about C in the article also applies to C++
As mentioned several times, you're being bitten by a trigraph. See this previous SO question for more information:
- Purpose of Trigraph sequences in C++?
You can fix the problem by using the '\?' escape sequence for the '?' character:
char* strange = "(Strange\?\?)";
In fact, this is the reason for that escape sequence, which is somewhat mysterious if you're unaware of those damn trigraphs.
While trying to cross-compile on GCC it picked my sequence up as a trigraph:
So all I need to do now is figure out how to disable this in projects by default since I can only see it creating problems for me. (I'm using a US keyboard layout anyway)
The default behavior on GCC is to ignore but give a warning, which is much more sane and is indeed what Visual Studio 2010 will adopt as the standard as far as I know.
精彩评论