Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?
I just found a comment in this answer saying that using iostream::eof
in a loop condition is "almost certainly wrong". I generally use something like while(cin>>n)
- which I guess i开发者_Python百科mplicitly checks for EOF.
Why is checking for eof explicitly using while (!cin.eof())
wrong?
How is it different from using scanf("...",...)!=EOF
in C (which I often use with no problems)?
Because iostream::eof
will only return true
after reading the end of the stream. It does not indicate, that the next read will be the end of the stream.
Consider this (and assume then next read will be at the end of the stream):
while(!inStream.eof()){
int data;
// yay, not end of stream yet, now read ...
inStream >> data;
// oh crap, now we read the end and *only* now the eof bit will be set (as well as the fail bit)
// do stuff with (now uninitialized) data
}
Against this:
int data;
while(inStream >> data){
// when we land here, we can be sure that the read was successful.
// if it wasn't, the returned stream from operator>> would be converted to false
// and the loop wouldn't even be entered
// do stuff with correctly initialized data (hopefully)
}
And on your second question: Because
if(scanf("...",...)!=EOF)
is the same as
if(!(inStream >> data).eof())
and not the same as
if(!inStream.eof())
inFile >> data
Bottom-line top: With proper handling of white space, the following is how eof
can be used (and even, be more reliable than fail()
for error checking):
while( !(in>>std::ws).eof() ) {
int data;
in >> data;
if ( in.fail() ) /* Handle with 'break' or 'throw' */;
// Now use data
}
(Thanks Tony D for the suggestion to highlight the answer. See his comment below for an example to why this is more robust.)
The main argument against using eof()
seems to be missing an important subtlety about the role of white space. My proposition is that, checking eof()
explicitly is not only not "always wrong"—which seems to be an overriding opinion in this and similar Stack Overflow questions—, but with proper handling of white-space, it provides for a cleaner and more reliable error handling, and is the always correct solution (although, not necessarily the tersest).
To summarize what is being suggested as the "proper" termination and read order is the following:
int data;
while(in >> data) { /* ... */ }
// Which is equivalent to
while( !(in >> data).fail() ) { /* ... */ }
The failure due to read attempt beyond eof is taken as the termination condition. This means is that there is no easy way to distinguish between a successful stream and one that really fails for reasons other than eof. Take the following streams:
1 2 3 4 5<eof>
1 2 a 3 4 5<eof>
a<eof>
while(in>>data)
terminates with a set failbit
for all three input. In the first and third, eofbit
is also set. So past the loop one needs very ugly extra logic to distinguish a proper input (first) from improper ones (second and third).
Whereas, take the following:
while( !in.eof() )
{
int data;
in >> data;
if ( in.fail() ) /* Handle with break or throw */;
// Now use data
}
Here, in.fail()
verifies that as long as there is something to read, it is the correct one. It's purpose is not a mere while loop terminator.
So far so good, but what happens if there is trailing space in the stream—what sounds like the major concern against eof()
as terminator?
We don't need to surrender our error handling; just eat up the white-space:
while( !in.eof() )
{
int data;
in >> data >> ws; // Eat white space with 'std::ws'
if ( in.fail() ) /* Handle with 'break' or 'throw' */;
// Now use data
}
std::ws
skips any potential (zero or more) trailing space in the stream while setting the eofbit
, and not the failbit
. So, in.fail()
works as expected, as long as there is at least one data to read. If all-blank streams are also acceptable, then the correct form is:
while( !(in>>ws).eof() )
{
int data;
in >> data;
if ( in.fail() ) /* Handle with 'break' or 'throw' */;
/* This will never fire if the eof is reached cleanly */
// Now use data
}
Summary: A properly constructed while(!eof)
is not only possible and not wrong, but it allows data to be localized within scope and provides a cleaner separation of error checking from business as usual. That being said, while(!fail)
is inarguably a more common and terse idiom, and may be preferred in simple (single data per read type of) scenarios.
Because if programmers don't write while(stream >> n)
, they possibly write this:
while(!stream.eof())
{
stream >> n;
//some work on n;
}
Here the problem is, you cannot do some work on n
without first checking if the stream read was successful, because if it was unsuccessful, your some work on n
would produce undesired result.
The whole point is that, eofbit
, badbit
, or failbit
are set after an attempt is made to read from the stream. So if stream >> n
fails, then eofbit
, badbit
, or failbit
is set immediately, so its more idiomatic if you write while (stream >> n)
, because the returned object stream
converts to false
if there was some failure in reading from the stream and consequently the loop stops. And it converts to true
if the read was successful and the loop continues.
The other answers have explained why the logic is wrong in while (!stream.eof())
and how to fix it. I want to focus on something different:
why is checking for eof explicitly using
iostream::eof
wrong?
In general terms, checking for eof
only is wrong because stream extraction (>>
) can fail without hitting the end of the file. If you have e.g. int n; cin >> n;
and the stream contains hello
, then h
is not a valid digit, so extraction will fail without reaching the end of the input.
This issue, combined with the general logic error of checking the stream state before attempting to read from it, which means for N input items the loop will run N+1 times, leads to the following symptoms:
If the stream is empty, the loop will run once.
>>
will fail (there is no input to be read) and all variables that were supposed to be set (bystream >> x
) are actually uninitialized. This leads to garbage data being processed, which can manifest as nonsensical results (often huge numbers).(If your standard library conforms to C++11, things are a bit different now: A failed
>>
now sets numeric variables to0
instead of leaving them uninitialized (except forchar
s).)If the stream is not empty, the loop will run again after the last valid input. Since in the last iteration all
>>
operations fail, variables are likely to keep their value from the previous iteration. This can manifest as "the last line is printed twice" or "the last input record is processed twice".(This should manifest a bit differently since C++11 (see above): Now you get a "phantom record" of zeroes instead of a repeated last line.)
If the stream contains malformed data but you only check for
.eof
, you end up with an infinite loop.>>
will fail to extract any data from the stream, so the loop spins in place without ever reaching the end.
To recap: The solution is to test the success of the >>
operation itself, not to use a separate .eof()
method: while (stream >> n >> m) { ... }
, just as in C you test the success of the scanf
call itself: while (scanf("%d%d", &n, &m) == 2) { ... }
.
The important thing to remember is that inFile.eof()
doesn’t become True
until after an attempted read fails, because you’ve reached the end of the file. So, in this example, you’ll get an error.
while (!inFile.eof()){
inFile >> x;
process(x);
}
The way to make this loop correct is to combine reading and checking into a single operation, like so
while (inFile >> x)
process(x);
By convention, operator>>
returns the stream we read from, and a Boolean test on a stream returns False
when the stream fails (such as reaching end of file).
So this gives us the correct sequence:
- read
- test whether the read succeeds
- if and only if the test succeeds, process what we’ve read
If you happen to encounter some other problem that prevents you from reading from the file correctly, you will not be able to reach eof()
as such. For example, let’s look at something like this
int x;
while (!inFile.eof()) {
inFile >> x;
process(x);
}
Let us trace through the working of the above code, with an example
- Assume the contents of the file are
'1', '2', '3', 'a', 'b'
. - The loop will read the 1, 2, and 3 correctly.
- Then it’ll get to
a
. - When it tries to extract
a
as an int, it’ll fail. - The stream is now in a failed state, until or unless we
clear
the stream, all attempts at reading from it will fail. - But, when we test for eof(), it’ll return
False
, because we’re not at the end of the file, because there’s stilla
waiting to be read. - The loop will keep trying to read from the file, and fail every time, so it never reaches the end of the file.
- So, the loop above will run forever.
But, if we use a loop like this, we will get the required output.
while (inFile >> x)
process(x);
In this case, the stream will convert to False
not only in case of end of file, but also in case of a failed conversion, such as the a
that we can’t read as an integer.
精彩评论