Are trigraph substitutions reverted when a raw string is created through concatenation?
It's pretty common to use macros and token conc开发者_如何学Catenation to switch between wide and narrow strings at compile time.
#define _T(x) L##x
const wchar_t *wide1 = _T("hello");
const wchar_t *wide2 = L"hello";
And in C++11 it should be valid to concoct a similar thing with raw strings:
#define RAW(x) R##x
const char *raw1 = RAW("(Hello)");
const char *raw2 = R"(Hello)";
Since macro expansion and token concatenation happens before escape sequence substitution, this should prevent escape sequences being replaced in the quoted string.
But how does this apply to trigraphs? Are raw strings formed through concatenation with normal strings still subject to having their trigraph substitutions reverted?
const char *trigraph = RAW("(??=)"); // Is this "#" or "??="?
No, the trigraph is not reverted in your example.
[lex.phases]p1
identifies three phases of translation relevant to your question:
1. Trigraph sequences are replaced by corresponding single-character internal representations.
3. The source file is decomposed into preprocessing tokens.
4. Macro invocations are expanded.
Phase 1 is defined by [lex.trigraph]p1
. At this stage, your code is translated to const char *trigraph = RAW("(#)")
.
Phase 3 is defined by [lex.pptoken]
. This is the stage where trigraphs are reverted in raw string literals. Paragraph 3 says:
If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", the next preprocessing token shall be a raw string literal. Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted.
That is not the case in your example, therefore the trigraph is not reverted. Your code is transformed into the preprocessing-token sequence const
char
*
trigraph
=
RAW
(
"(#)"
)
Finally, in phase 4, the RAW
macro is expanded and the token-paste occurs, resulting in the following sequence of preprocessing-tokens: const
char
*
trigraph
=
R"(#)"
. The r-char-sequence of the string literal comprises a #
. Phase 3 has already occurred, and there is no other point at which reversion of trigraphs occurs.
Trigraph substitution happens before macro processing.
UPD Please disregard this. I haven't realized that c++0x reverts trigraphs in raw string literals.
UPD2 2.5.3 describes the process of forming raw-string-literal preprocessing tokens. Trigraph reversal is a part of this process. There are no raw-string-literals which are not preprocessing tokens. So the answer to your question seems to be yes.
精彩评论