What is the purpose of the h and hh modifiers for printf?
Aside from %hn
and %hhn
(where the h
or hh
specifies the size of the pointed-to object), what is the point of the h
and hh
modifiers for printf
format specifiers?
Due to default promotions which are required by the standard to be applied for variadic functions, it is impossible to pass arguments of type char
or short
(or any signed/unsigned variants thereof) to printf
.
According to 7.19.6.1(7), the h
modifier:
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing); or that a follo开发者_如何学编程wing n conversion specifier applies to a pointer to a short int argument.
If the argument was actually of type short
or unsigned short
, then promotion to int
followed by a conversion back to short
or unsigned short
will yield the same value as promotion to int
without any conversion back. Thus, for arguments of type short
or unsigned short
, %d
, %u
, etc. should give identical results to %hd
, %hu
, etc. (and likewise for char
types and hh
).
As far as I can tell, the only situation where the h
or hh
modifier could possibly be useful is when the argument passed it an int
outside the range of short
or unsigned short
, e.g.
printf("%hu", 0x10000);
but my understanding is that passing the wrong type like this results in undefined behavior anyway, so that you could not expect it to print 0.
One real world case I've seen is code like this:
char c = 0xf0;
printf("%hhx", c);
where the author expects it to print f0
despite the implementation having a plain char
type that's signed (in which case, printf("%x", c)
would print fffffff0
or similar). But is this expectation warranted?
(Note: What's going on is that the original type was char
, which gets promoted to int
and converted back to unsigned char
instead of char
, thus changing the value that gets printed. But does the standard specify this behavior, or is it an implementation detail that broken software might be relying on?)
One possible reason: for symmetry with the use of those modifiers in the formatted input functions? I know it wouldn't be strictly necessary, but maybe there was value seen for that?
Although they don't mention the importance of symmetry for the "h" and "hh" modifiers in the C99 Rationale document, the committee does mention it as a consideration for why the "%p" conversion specifier is supported for fscanf()
(even though that wasn't new for C99 - "%p" support is in C90):
Input pointer conversion with %p was added to C89, although it is obviously risky, for symmetry with fprintf.
In the section on fprintf()
, the C99 rationale document does discuss that "hh" was added, but merely refers the reader to the fscanf()
section:
The %hh and %ll length modifiers were added in C99 (see §7.19.6.2).
I know it's a tenuous thread, but I'm speculating anyway, so I figured I'd give whatever argument there might be.
Also, for completeness, the "h" modifier was in the original C89 standard - presumably it would be there even if it wasn't strictly necessary because of widespread existing use, even if there might not have been a technical requirement to use the modifier.
In %...x
mode, all values are interpreted as unsigned. Negative numbers are therefore printed as their unsigned conversions. In 2's complement arithmetic, which most processors use, there is no difference in bit patterns between a signed negative number and its positive unsigned equivalent, which is defined by modulus arithmetic (adding the maximum value for the field plus one to the negative number, according to the C99 standard). Lots of software- especially the debugging code most likely to use %x
- makes the silent assumption that the bit representation of a signed negative value and its unsigned cast is the same, which is only true on a 2's complement machine.
The mechanics of this cast are such that hexidecimal representations of value always imply, possibly inaccurately, that a number has been rendered in 2's complement, as long as it didn't hit an edge condition of where the different integer representations have different ranges. This even holds true for arithmetic representations where the value 0 is not represented with the binary pattern of all 0s.
A negative short
displayed as an unsigned long
in hexidecimal will therefore, on any machine, be padded with f
, due to implicit sign extension in the promotion, which printf
will print. The value is the same, but it is truly visually misleading as to the size of the field, implying a significant amount of range that simply isn't present.
%hx
truncates the displayed representation to avoid this padding, exactly as you concluded from your real-world use case.
The behavior of printf
is undefined when passed an int
outside the range of short
that should be printed as a short
, but the easiest implementation by far simply discards the high bit by a raw downcast, so while the spec doesn't require any specific behavior, pretty much any sane implementation is going to just perform the truncation. There're generally better ways to do that, though.
If printf isn't padding values or displaying unsigned representations of signed values, %h
isn't very useful.
The only use I can think of is for passing an unsigned short
or unsigned char
and using the %x
conversion specifier. You cannot simply use a bare %x
- the value may be promoted to int
rather than unsigned int
, and then you have undefined behaviour.
Your alternatives are either to explicitly cast the argument to unsigned
; or to use %hx
/ %hhx
with a bare argument.
The variadic arguments to printf()
et al are automatically promoted using the default conversions, so any short
or char
values are promoted to int
when passed to the function.
In the absence of the h
or hh
modifiers, you would have to mask the values passed to get the correct behaviour reliably. With the modifiers, you no longer have to mask the values; the printf()
implementation does the job properly.
Specifically, for the format %hx
, the code inside printf()
can do something like:
va_list args;
va_start(args, format);
...
int i = va_arg(args, int);
unsigned short s = (unsigned short)i;
...print s correctly, as 4 hex digits maximum
...even on a machine with 64-bit `int`!
I'm blithely assuming that short
is a 16-bit quantity; the standard does not actually guarantee that, of course.
I found it useful to avoid casting when formatting unsigned chars to hex:
sprintf_s(tmpBuf, 3, "%2.2hhx", *(CEKey + i));
It's a minor coding convenience, and looks cleaner than multiple casts (IMO).
another place it's handy is snprintf size check. gcc7 added size check when using snprintf so this will fail
char arr[4];
char x='r';
snprintf(arr,sizeof(arr),"%d",r);
so it forces you to use bigger char when using %d when formatting a char
here is a commit that shows those fixes instead of increasing the char array size they changed %d to %h. this also give more accurate description
https://github.com/Mellanox/libvma/commit/b5cb1e34a04b40427d195b14763e462a0a705d23#diff-6258d0a11a435aa372068037fe161d24
I agree with you that it is not strictly necessary, and so by that reason alone is no good in a C library function :)
It might be "nice" for the symmetry of the different flags, but it is mostly counter-productive because it hides the "conversion to int
" rule.
精彩评论