开发者

Unknown meta-character in C/C++ string literal?

I created a new project with the foll开发者_运维百科owing code segment:

char* strange = "(Strange??)";
cout << strange << endl;

resulting in the following output:

(Strange]

Thus translating '??)' -> ']'

Debugging it shows that my char* string literal is actually that value and it's not a stream translation. This is obviously not a meta-character sequence I've ever seen. Some sort of Unicode or wide char sequence perhaps? I don't think so however... I've tried disabling all related project settings to no avail.

Anyone have an explanation?

  • search : 'question mark, question mark, close brace' c c++ string literal


What you're seeing is called a trigraph.

In written language by grown-ups, one question mark is sufficient for any situation. Don't use more than one at a time and you'll never see this again.

GCC ignores trigraphs by default because hardly anyone uses them intentionally. Enable them with the -trigraph option, or tell the compiler to warning you about them with the -Wtrigraphs option.

Visual C++ 2010 also disables them by default and offers /Zc:trigraphs to enable them. I can't find anything about ways to enable or disable them in prior versions.


Easy way to avoid the trigraph surprise: split a "??" string literal in two:

char* strange = "(Strange??)";
char* strange2 = "(Strange?" "?)";
/*                         ^^^ no punctuation */

Edit
gcc has an option to warn about trigraphs: -Wtrigraphs (enabled with -Wall also)
end edit

Quotes from the Standard

    5.2.1.1 Trigraph sequences
1   Before any other processing takes place, each occurrence of one of the
    following sequences of three characters (called trigraph sequences13))
    is replaced with the corresponding single character.
           ??=      #               ??)      ]               ??!      |
           ??(      [               ??'      ^               ??>      }
           ??/      \               ??<      {               ??-      ~
    No other trigraph sequences exist. Each ? that does not begin one of
    the trigraphs listed above is not changed.
    5.1.1.2 Translation phases
1   The precedence among the syntax rules of translation is specified by
    the following phases.
         1.   Physical source file multibyte characters are mapped, in an
              implementation-defined manner, to the source character set
              (introducing new-line characters for end-of-line indicators)
              if necessary. Trigraph sequences are replaced by corresponding
              single-character internal representations.


It's a Trigraph!


??) is a trigraph.


That's trigraph support. You can prevent trigraph interpretation by escaping any of the characters:

char* strange = "(Strange?\?)";


It's a trigraph.


Trigraphs are the reason. The talk about C in the article also applies to C++


As mentioned several times, you're being bitten by a trigraph. See this previous SO question for more information:

  • Purpose of Trigraph sequences in C++?

You can fix the problem by using the '\?' escape sequence for the '?' character:

char* strange = "(Strange\?\?)";

In fact, this is the reason for that escape sequence, which is somewhat mysterious if you're unaware of those damn trigraphs.


While trying to cross-compile on GCC it picked my sequence up as a trigraph:

So all I need to do now is figure out how to disable this in projects by default since I can only see it creating problems for me. (I'm using a US keyboard layout anyway)

The default behavior on GCC is to ignore but give a warning, which is much more sane and is indeed what Visual Studio 2010 will adopt as the standard as far as I know.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜