Why is int rather than unsigned int used for C and C++ for loops?
This is a rather silly question but why is int
commonly used instead of unsigned int
when defining a for loop for an array in C or C++?
for(int i;i<arraySize;i++){}
for(unsigned int i;i<arraySize;i++){}
I recognize the benefits of using int
when doing something other than array indexing and the benefits of an iterator when using C++ containers. Is it just because it does not matter when looping through an array? Or should I avoid it all together and use a different type such as开发者_Python百科 size_t
?
Using int
is more correct from a logical point of view for indexing an array.
unsigned
semantic in C and C++ doesn't really mean "not negative" but it's more like "bitmask" or "modulo integer".
To understand why unsigned
is not a good type for a "non-negative" number please consider these totally absurd statements:
- Adding a possibly negative integer to a non-negative integer you get a non-negative integer
- The difference of two non-negative integers is always a non-negative integer
- Multiplying a non-negative integer by a negative integer you get a non-negative result
Obviously none of the above phrases make any sense... but it's how C and C++ unsigned
semantic indeed works.
Actually using an unsigned
type for the size of containers is a design mistake of C++ and unfortunately we're now doomed to use this wrong choice forever (for backward compatibility). You may like the name "unsigned" because it's similar to "non-negative" but the name is irrelevant and what counts is the semantic... and unsigned
is very far from "non-negative".
For this reason when coding most loops on vectors my personally preferred form is:
for (int i=0,n=v.size(); i<n; i++) {
...
}
(of course assuming the size of the vector is not changing during the iteration and that I actually need the index in the body as otherwise the for (auto& x : v)...
is better).
This running away from unsigned
as soon as possible and using plain integers has the advantage of avoiding the traps that are a consequence of unsigned size_t
design mistake. For example consider:
// draw lines connecting the dots
for (size_t i=0; i<pts.size()-1; i++) {
drawLine(pts[i], pts[i+1]);
}
the code above will have problems if the pts
vector is empty because pts.size()-1
is a huge nonsense number in that case. Dealing with expressions where a < b-1
is not the same as a+1 < b
even for commonly used values is like dancing in a minefield.
Historically the justification for having size_t
unsigned is for being able to use the extra bit for the values, e.g. being able to have 65535 elements in arrays instead of just 32767 on 16-bit platforms. In my opinion even at that time the extra cost of this wrong semantic choice was not worth the gain (and if 32767 elements are not enough now then 65535 won't be enough for long anyway).
Unsigned values are great and very useful, but NOT for representing container size or for indexes; for size and index regular signed integers work much better because the semantic is what you would expect.
Unsigned values are the ideal type when you need the modulo arithmetic property or when you want to work at the bit level.
This is a more general phenomenon, often people don't use the correct types for their integers. Modern C has semantic typedefs that are much preferable over the primitive integer types. E.g everything that is a "size" should just be typed as size_t
. If you use the semantic types systematically for your application variables, loop variables come much easier with these types, too.
And I have seen several bugs that where difficult to detect that came from using int
or so. Code that all of a sudden crashed on large matrixes and stuff like that. Just coding correctly with correct types avoids that.
It's purely laziness and ignorance. You should always use the right types for indices, and unless you have further information that restricts the range of possible indices, size_t
is the right type.
Of course if the dimension was read from a single-byte field in a file, then you know it's in the range 0-255, and int
would be a perfectly reasonable index type. Likewise, int
would be okay if you're looping a fixed number of times, like 0 to 99. But there's still another reason not to use int
: if you use i%2
in your loop body to treat even/odd indices differently, i%2
is a lot more expensive when i
is signed than when i
is unsigned...
Not much difference. One benefit of int
is it being signed. Thus int i < 0
makes sense, while unsigned i < 0
doesn't much.
If indexes are calculated, that may be beneficial (for example, you might get cases where you will never enter a loop if some result is negative).
And yes, it is less to write :-)
Using int
to index an array is legacy, but still widely adopted. int
is just a generic number type and does not correspond to the addressing capabilities of the platform. In case it happens to be shorter or longer than that, you may encounter strange results when trying to index a very large array that goes beyond.
On modern platforms, off_t
, ptrdiff_t
and size_t
guarantee much more portability.
Another advantage of these types is that they give context to someone who reads the code. When you see the above types you know that the code will do array subscripting or pointer arithmetic, not just any calculation.
So, if you want to write bullet-proof, portable and context-sensible code, you can do it at the expense of a few keystrokes.
GCC even supports a typeof
extension which relieves you from typing the same typename all over the place:
typeof(arraySize) i;
for (i = 0; i < arraySize; i++) {
...
}
Then, if you change the type of arraySize
, the type of i
changes automatically.
It really depends on the coder. Some coders prefer type perfectionism, so they'll use whatever type they're comparing against. For example, if they're iterating through a C string, you might see:
size_t sz = strlen("hello");
for (size_t i = 0; i < sz; i++) {
...
}
While if they're just doing something 10 times, you'll probably still see int
:
for (int i = 0; i < 10; i++) {
...
}
I use int
cause it requires less physical typing and it doesn't matter - they take up the same amount of space, and unless your array has a few billion elements you won't overflow if you're not using a 16-bit compiler, which I'm usually not.
Because unless you have an array with size bigger than two gigabyts of type char
, or 4 gigabytes of type short
or 8 gigabytes of type int
etc, it doesn't really matter if the variable is signed or not.
So, why type more when you can type less?
Aside from the issue that it's shorter to type, the reason is that it allows negative numbers.
Since we can't say in advance whether a value can ever be negative, most functions that take integer arguments take the signed variety. Since most functions use signed integers, it is often less work to use signed integers for things like loops. Otherwise, you have the potential of having to add a bunch of typecasts.
As we move to 64-bit platforms, the unsigned range of a signed integer should be more than enough for most purposes. In these cases, there's not much reason not to use a signed integer.
Consider the following simple example:
int max = some_user_input; // or some_calculation_result
for(unsigned int i = 0; i < max; ++i)
do_something;
If max
happens to be a negative value, say -1, the -1
will be regarded as UINT_MAX
(when two integers with the sam rank but different sign-ness are compared, the signed one will be treated as an unsigned one). On the other hand, the following code would not have this issue:
int max = some_user_input;
for(int i = 0; i < max; ++i)
do_something;
Give a negative max
input, the loop will be safely skipped.
Using a signed int
is - in most cases - a mistake that could easily result in potential bugs as well as undefined behavior.
Using size_t
matches the system's word size (64 bits on 64 bit systems and 32 bits on 32 bit systems), always allowing for the correct range for the loop and minimizing the risk of an integer overflow.
The int
recommendation comes to solve an issue where reverse for
loops were often written incorrectly by unexperienced programmers (of course, int
might not be in the correct range for the loop):
/* a correct reverse for loop */
for (size_t i = count; i > 0;) {
--i; /* note that this is not part of the `for` statement */
/* code for loop where i is for zero based `index` */
}
/* an incorrect reverse for loop (bug on count == 0) */
for (size_t i = count - 1; i > 0; --i) {
/* i might have overflowed and undefined behavior occurs */
}
In general, signed and unsigned variables shouldn't be mixed together, so at times using an int
in unavoidable. However, the correct type for a for
loop is as a rule size_t
.
There's a nice talk about this misconception that signed variables are better than unsigned variables, you can find it on YouTube (Signed Integers Considered Harmful by Robert Seacord).
TL;DR;: Signed variables are more dangerous and require more code than unsigned variables (which should be preferred almost in all cases and definitely whenever negative values aren't logically expected).
With unsigned variables the only concern is the overflow boundary which has a strictly defined behavior (wrap-around) and uses clearly defined modular mathematics.
This allows a single edge case test to catch an overflow and that test can be performed after the mathematical operation was executed.
However, with signed variables the overflow behavior is undefined (UB) and the negative range is actually larger than the positive range - things that add edge cases that must be tested for and explicitly handled before the mathematical operation can be executed.
i.e., how much INT_MIN * -1
? (the pre-processor will protect you, but without it you're in a jam).
P.S.
As for the example offered by @6502 in their answer, the whole thing is again an issue of trying to cut corners and a simple missing if
statement.
When a loop assumes at least 2 elements in an array, this assumption should be tested beforehand. i.e.:
// draw lines connecting the dots - forward loop
if(pts.size() > 1) { // first make sure there's enough dots
for (size_t i=0; i < pts.size()-1; i++) { // then loop
drawLine(pts[i], pts[i+1]);
}
}
// or test against i + 1 : which tests the desired pts[i+1]
for (size_t i = 0; i + 1 < pts.size(); i++) { // then loop
drawLine(pts[i], pts[i+1]);
}
// or start i as 1 : but note that `-` is slower than `+`
for (size_t i = 1; i < pts.size(); i++) { // then loop
drawLine(pts[i - 1], pts[i]);
}
精彩评论