开发者

What is the correct type for array indexes in C?

What type for array index in C99 should be used? It have to work on LP32, ILP32, ILP64, LP64, LLP64 and more. It doesn't have to be a C89 type.

I have found 5 candidates:

  • size_t
  • ptrdiff_t
  • intptr_t / uintptr_t
  • int_fast*_t / uint_fast*_t
  • int_least*_t / uint_least*_t

There is simple code to better illustrate problem. What is the best type for i and j in these two particular loops. If there is a good reason, two different types are fine too.

for (i=0; i<imax; i++) {
        do_something(a[i]);
}
/* jmin can be less than 0 */
for (j=jmin; j<jmax; j++) {
        do_something(a[j]);
}

P.S. In the first version of question I had forgotten about negative indexes.

P.P.S. I am not going to write a C99 compiler. However any answer from a compiler programmer would be very valuable for me.

Similar question:


I think you should use ptrdiff_t for the following reasons

  • Indices can be negative. Therefore for a general statement, all unsigned types, including size_t, are unsuitable.
  • The type of p2 - p1 is ptrdiff_t. If i == p2 - p1, then you should be able to get p2 back by p2 == p1 + i. Notice that *(p + i) is equivalent to p[i].
  • As another indication for this "general index type", the type of the index that's used by overload resolution when the builtin operator[] (for example, on a pointer) competes against a user-provided operator[] (for example vector's) is exactly that (http://eel.is/c++draft/over.built#16): >

    For every cv-qualified or cv-unqualified object type T there exist candidate operator functions of the form

    T*      operator+(T*, std::ptrdiff_t);
    T&      operator[](T*, std::ptrdiff_t);
    T*      operator-(T*, std::ptrdiff_t);
    T*      operator+(std::ptrdiff_t, T*);
    T&      operator[](std::ptrdiff_t, T*);
    

EDIT: If you have a really big array or a pointer to a really big memory portion, then my "general index type" doesn't cut it, as it then isn't guaranteed that you can subtract the first element's address from the last element's address. @Ciro's answer should be used then https://stackoverflow.com/a/31090426/34509 . Personally I try to avoid using unsigned types for their non-ability to represent negative edge cases (loop end-values when iterating backwards for example), but this is a kind of religious debate (I'm not alone in that camp, though). In cases where using an unsigned type is required, I must put my religion aside, of course.


I almost always use size_t for array indices/loop counters. Sure there are some special instances where you may want signed offsets, but in general using a signed type has a lot of problems:

The biggest risk is that if you're passed a huge size/offset by a caller treating things as unsigned (or if you read it from a wrongly-trusted file), you may interpret it as a negative number and fail to catch that it's out of bounds. For instance if (offset<size) array[offset]=foo; else error(); will write somewhere it shouldn't.

Another problem is the possibility of undefined behavior with signed integer overflow. Whether you use unsigned or signed arithmetic, there are overflow issues to be aware of and check for, but personally I find the unsigned behavior a lot easier to deal with.

Yet another reason to use unsigned arithmetic (in general) - sometimes I'm using indices as offsets into a bit array and I want to use %8 and /8 or %32 and /32. With signed types, these will be actual division operations. With unsigned, the expected bitwise-and/bitshift operations can be generated.


Since the type of sizeof(array) (and malloc's argument) is size_t, and the array can't hold more elements than its size, it follows that size_t can be used for the array's index.

EDIT This analysis is for 0-based arrays, which is the common case. ptrdiff_t will work in any case, but it's a little strange for an index variable to have a pointer-difference type.


size_t

If you start at 0, use size_t because that type must be able to index any array:

  • sizeof returns it, so it is not valid for an array to have more than size_t elements
  • malloc takes it as argument, as mentioned by Amnon

If you start below zero, then shift to start at zero, and use size_t, which is guaranteed to work because of the reasons above. So replace:

for (j = jmin; j < jmax; j++) {
    do_something(a[j]);
}

with:

int *b = &a[jmin];
for (size_t i = 0; i < (jmax - jmin); i++) {
    do_something(b[i]);
}

Why not to use:

  • ptrdiff_t: the maximum value this represents may be smaller than the maximum value of size_t.

    This is mentioned at cppref, and the possibility of undefined behavior if the array is too large is suggested at C99 6.5.5/9:

    When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the header. If the result is not representable in an object of that type, the behavior is undefined.

    Out of curiosity, intptr_t might also be larger than size_t on a segmented memory architecture: https://stackoverflow.com/a/1464194/895245

    GCC also imposes further limits on the maximum size of static array objects: What is the maximum size of an array in C?

  • uintptr_t: I'm not sure. So I'd just use size_t because I'm more sure :-)

See also:

  • C++ version of this question: Type of array index in C++


My choice: ptrdiff_t

Many have voted for ptrdiff_t, but some have said that it is strange to index using a pointer difference type. To me, it makes perfect sense: the array index is the difference from the origin pointer.

Some have also said that size_t is right because that is designed to hold the size. However, as some have commented: this is the size in bytes, and so can generally hold values several times greater than the maximum possible array index.


I use unsigned int. (though I prefer the shorthand unsigned)

In C99, unsigned int is guaranteed to be able to index any portable array. Only arrays of 65'535 bytes or smaller are guaranteed to be supported, and the maximum unsigned int value is at least 65'535.

From the public WG14 N1256 draft of the C99 standard:

5.2.4.1 Translation limits

The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits: (Implementations should avoid imposing fixed translation limits whenever possible.)

(...)

  • 65535 bytes in an object (in a hosted environment only)

(...)

5.2.4.2 Numerical limits

An implementation is required to document all the limits specified in this subclause, which are specified in the headers <limits.h> and <float.h>. Additional limits are specified in <stdint.h>.

5.2.4.2.1 Sizes of integer types <limits.h>

The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute v alue) to those shown, with the same sign.

(...)

  • maximum value for an object of type unsigned int UINT_MAX 65535 // 2^16 - 1

In C89, the maximum portable array size is actually only 32'767 bytes, so even a signed int will do, which has a maximum value of at least 32'767 (Appendix A.4).

From §2.2.4 of a C89 draft:

2.2.4.1 Translation limits

The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits: (Implementations should avoid imposing fixed translation limits whenever possible.)

(...)

  • 32767 bytes in an object (in a hosted environment only)

(...)

2.2.4.2 Numerical limits

A conforming implementation shall document all the limits specified in this section, which shall be specified in the headers <limits.h> and <float.h>.

"Sizes of integral types <limits.h>"

The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

(...)

  • maximum value for an object of type int INT_MAX +32767


If you know the maximum length of your array in advance you can use

  • int_fast*_t / uint_fast*_t
  • int_least*_t / uint_least*_t

In all other cases i would recommend using

  • size_t

or

  • ptrdiff_t

depending on weather you want to allow negative indexes.

Using

  • intptr_t / uintptr_t

would be also safe, but have a bit different semantics.


In your situation, I would use ptrdiff_t. It's not just that indicies can be negative. You might want to count down to zero, in which case signed types yield a nasty, subtle bug:

for(size_t i=5; i>=0; i--) {
  printf("danger, this loops forever\n);
}

That won't happen if you use ptrdiff_t or any other suitable signed type. On POSIX systems, you can use ssize_t.

Personally, I often just use int, even though it is arguably not the Correct Thing To Do.


I usually use size_t for array offsets, but if you want negative array indexing, use int. It is able to address the maximum sized-array guaranteed by C89 (32767 bytes).

If you want to access arrays of the maximum size guaranteed by C99 (65535 bytes), use unsigned.

See previous revisions for accessing arrays allowed, but not guaranteed, by C.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜