When did C++ compilers start considering more than two hex digits in string literal character escapes?

2023-02-28 21:31 问答作者：

I've got a (generated) literal string in C++ that may contain characters that need to be escaped using the \x notation. For example:

char foo[] = "\xABEcho";

However, g++ (version 4.1.2 if it matters) throws an error:

test.cpp:1: error: hex escape sequence out of range

The compiler appears to be considering the Ec characters as part of the preceding hex number (because they look like hex digits). Since a four digit hex number won't fit in a char, an error is raised. Obviously for a wide string literal L"\xABEcho" the first character would be U+ABEC, followed by L"ho".

It seems this has changed sometime in the past couple of decades and I never noticed. I'm almost certain that old C compilers would only consider two hex digits after \x, and not look any further.

I can think of one workaround for this:

char foo[] = "\xAB""Echo";

but that's a bit ugly. So I have three questions:

When did this change?
Why doesn't the compiler only accept >2-digit hex escapes for wide string literals?
开发者_运维问答
Is there a workaround that's less awkward than the above?

GCC is only following the standard. #877: "Each [...] hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence."

I have found answers to my questions:

C++ has always been this way (checked Stroustrup 3rd edition, didn't have any earlier). K&R 1st edition did not mention \x at all (the only character escapes available at that time were octal). K&R 2nd edition states:
```
'\xhh'
```
where hh is one or more hexadecimal digits (0...9, a...f, A...F).
so it appears this behaviour has been around since ANSI C.
While it might be possible for the compiler to only accept >2 characters for wide string literals, this would unnecessarily complicate the grammar.
There is indeed a less awkward workaround:
```
char foo[] = "\u00ABEcho";
```
The \u escape accepts four hex digits always.

Update: The use of \u isn't quite applicable in all situations because most ASCII characters are (for some reason) not permitted to be specified using \u. Here's a snippet from GCC:

/* The standard permits $, @ and ` to be specified as UCNs.  We use
     hex escapes so that this also works with EBCDIC hosts.  */
  else if ((result < 0xa0
            && (result != 0x24 && result != 0x40 && result != 0x60))
           || (result & 0x80000000)
           || (result >= 0xD800 && result <= 0xDFFF))
    {
      cpp_error (pfile, CPP_DL_ERROR,
                 "%.*s is not a valid universal character",
                 (int) (str - base), base);
      result = 1;
    }

I'm pretty sure that C++ has always been this way. In any case, CHAR_BIT may be greater than 8, in which case '\xABE' or '\xABEc' could be valid.

I solved this by specifying the following char with \xnn too. Unfortunatly, you have to use this for as long as there are char in the [a..f] range. ex. "\xnneceg" is replaced by "\xnn\x65\x63\x65g"

These are wide-character literals.

char foo[] = "\x00ABEcho";

Might be better.

Here's some information, not gcc, but still seems to apply.

http://publib.boulder.ibm.com/infocenter/iadthelp/v7r0/index.jsp?topic=/com.ibm.etools.iseries.pgmgd.doc/cpprog624.htm

This link includes the important line:

Specifying \xnn in a wchar_t string literal is equivalent to specifying \x00nn

This may also be helpful.

http://www.gnu.org/s/hello/manual/libc/Extended-Char-Intro.html#Extended-Char-Intro

I also ran into this problem. I found that I could add a space at the end of the second hex digit and then get rid of the space by following the space with a backspace '\b'. Not exactly desirable but it seemed to work.

"Julius C\xE6sar the conqueror of the frana\xE7 \bais"

继续阅读：escaping literals string

When did C++ compilers start considering more than two hex digits in string literal character escapes?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？