What is the correct type for array indexes in C?
What type for array index in C99 should be used? It have to work on LP32, ILP32, ILP64, LP64, LLP64 and more. It doesn't have to be a C89 type.
I have found 5 candidates:
size_t
ptrdiff_t
intptr_t
/uintptr_t
int_fast*_t
/uint_fast*_t
int_least*_t
/uint_least*_t
There is simple code to better illustrate problem. What is the best type for i
and j
in these two particular loops. If there is a good reason, two different types are fine too.
for (i=0; i<imax; i++) {
do_something(a[i]);
}
/* jmin can be less than 0 */
for (j=jmin; j<jmax; j++) {
do_something(a[j]);
}
P.S. In the first version of question I had forgotten about negative indexes.
P.P.S. I am not going to write a C99 compiler. However any answer from a compiler programmer would be very valuable for me.
Similar question:
- size_t vs. uintptr_t The con开发者_StackOverflow社区text of this question if different though.
I think you should use ptrdiff_t
for the following reasons
- Indices can be negative. Therefore for a general statement, all unsigned types, including
size_t
, are unsuitable. - The type of
p2 - p1
isptrdiff_t
. Ifi == p2 - p1
, then you should be able to getp2
back byp2 == p1 + i
. Notice that*(p + i)
is equivalent top[i]
. - As another indication for this "general index type", the type of the index that's used by overload resolution when the builtin
operator[]
(for example, on a pointer) competes against a user-providedoperator[]
(for example vector's) is exactly that (http://eel.is/c++draft/over.built#16): >For every cv-qualified or cv-unqualified object type T there exist candidate operator functions of the form
T* operator+(T*, std::ptrdiff_t); T& operator[](T*, std::ptrdiff_t); T* operator-(T*, std::ptrdiff_t); T* operator+(std::ptrdiff_t, T*); T& operator[](std::ptrdiff_t, T*);
EDIT: If you have a really big array or a pointer to a really big memory portion, then my "general index type" doesn't cut it, as it then isn't guaranteed that you can subtract the first element's address from the last element's address. @Ciro's answer should be used then https://stackoverflow.com/a/31090426/34509 . Personally I try to avoid using unsigned types for their non-ability to represent negative edge cases (loop end-values when iterating backwards for example), but this is a kind of religious debate (I'm not alone in that camp, though). In cases where using an unsigned type is required, I must put my religion aside, of course.
I almost always use size_t
for array indices/loop counters. Sure there are some special instances where you may want signed offsets, but in general using a signed type has a lot of problems:
The biggest risk is that if you're passed a huge size/offset by a caller treating things as unsigned (or if you read it from a wrongly-trusted file), you may interpret it as a negative number and fail to catch that it's out of bounds. For instance if (offset<size) array[offset]=foo; else error();
will write somewhere it shouldn't.
Another problem is the possibility of undefined behavior with signed integer overflow. Whether you use unsigned or signed arithmetic, there are overflow issues to be aware of and check for, but personally I find the unsigned behavior a lot easier to deal with.
Yet another reason to use unsigned arithmetic (in general) - sometimes I'm using indices as offsets into a bit array and I want to use %8 and /8 or %32 and /32. With signed types, these will be actual division operations. With unsigned, the expected bitwise-and/bitshift operations can be generated.
Since the type of sizeof(array)
(and malloc
's argument) is size_t
, and the array can't hold more elements than its size, it follows that size_t
can be used for the array's index.
EDIT
This analysis is for 0-based arrays, which is the common case. ptrdiff_t
will work in any case, but it's a little strange for an index variable to have a pointer-difference type.
size_t
If you start at 0
, use size_t
because that type must be able to index any array:
sizeof
returns it, so it is not valid for an array to have more thansize_t
elementsmalloc
takes it as argument, as mentioned by Amnon
If you start below zero, then shift to start at zero, and use size_t
, which is guaranteed to work because of the reasons above. So replace:
for (j = jmin; j < jmax; j++) {
do_something(a[j]);
}
with:
int *b = &a[jmin];
for (size_t i = 0; i < (jmax - jmin); i++) {
do_something(b[i]);
}
Why not to use:
ptrdiff_t: the maximum value this represents may be smaller than the maximum value of
size_t
.This is mentioned at cppref, and the possibility of undefined behavior if the array is too large is suggested at C99 6.5.5/9:
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the header. If the result is not representable in an object of that type, the behavior is undefined.
Out of curiosity,
intptr_t
might also be larger thansize_t
on a segmented memory architecture: https://stackoverflow.com/a/1464194/895245GCC also imposes further limits on the maximum size of static array objects: What is the maximum size of an array in C?
uintptr_t: I'm not sure. So I'd just use
size_t
because I'm more sure :-)
See also:
- C++ version of this question: Type of array index in C++
My choice: ptrdiff_t
Many have voted for ptrdiff_t
, but some have said that it is strange to index using a pointer difference type. To me, it makes perfect sense: the array index is the difference from the origin pointer.
Some have also said that size_t
is right because that is designed to hold the size. However, as some have commented: this is the size in bytes, and so can generally hold values several times greater than the maximum possible array index.
I use unsigned int
. (though I prefer the shorthand unsigned
)
In C99, unsigned int
is guaranteed to be able to index any portable array. Only arrays of 65'535 bytes or smaller are guaranteed to be supported, and the maximum unsigned int
value is at least 65'535.
From the public WG14 N1256 draft of the C99 standard:
5.2.4.1 Translation limits
The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits: (Implementations should avoid imposing fixed translation limits whenever possible.)
(...)
- 65535 bytes in an object (in a hosted environment only)
(...)
5.2.4.2 Numerical limits
An implementation is required to document all the limits specified in this subclause, which are specified in the headers
<limits.h>
and<float.h>
. Additional limits are specified in<stdint.h>
.5.2.4.2.1 Sizes of integer types
<limits.h>
The values given below shall be replaced by constant expressions suitable for use in
#if
preprocessing directives. Moreover, except forCHAR_BIT
andMB_LEN_MAX
, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute v alue) to those shown, with the same sign.(...)
- maximum value for an object of type
unsigned int
UINT_MAX
65535 // 2^16 - 1
In C89, the maximum portable array size is actually only 32'767 bytes, so even a signed int
will do, which has a maximum value of at least 32'767 (Appendix A.4).
From §2.2.4 of a C89 draft:
2.2.4.1 Translation limits
The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits: (Implementations should avoid imposing fixed translation limits whenever possible.)
(...)
- 32767 bytes in an object (in a hosted environment only)
(...)
2.2.4.2 Numerical limits
A conforming implementation shall document all the limits specified in this section, which shall be specified in the headers
<limits.h>
and<float.h>
."Sizes of integral types
<limits.h>
"The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
(...)
- maximum value for an object of type int
INT_MAX
+32767
If you know the maximum length of your array in advance you can use
int_fast*_t / uint_fast*_t
int_least*_t / uint_least*_t
In all other cases i would recommend using
size_t
or
ptrdiff_t
depending on weather you want to allow negative indexes.
Using
intptr_t / uintptr_t
would be also safe, but have a bit different semantics.
In your situation, I would use ptrdiff_t
. It's not just that indicies can be negative. You might want to count down to zero, in which case signed types yield a nasty, subtle bug:
for(size_t i=5; i>=0; i--) {
printf("danger, this loops forever\n);
}
That won't happen if you use ptrdiff_t
or any other suitable signed type. On POSIX systems, you can use ssize_t
.
Personally, I often just use int
, even though it is arguably not the Correct Thing To Do.
I usually use size_t
for array offsets, but if you want negative array indexing, use int
. It is able to address the maximum sized-array guaranteed by C89 (32767 bytes).
If you want to access arrays of the maximum size guaranteed by C99 (65535 bytes), use unsigned
.
See previous revisions for accessing arrays allowed, but not guaranteed, by C.
精彩评论