开发者

getline() sets failbit and skips last line

I'm using std::getline() to enumerate through the lines in a file, and it's mostly working. It's left me curious however - std::getline() is开发者_开发百科 skipping the very last line in my file, but only if it's blank. Using this minimal example:

#include <iostream>
#include <string>

int main()
{
        std::string line;
        while(std::getline(std::cin, line))
                std::cout << "Line: “" << line << "”\n";
        return 0;
}

If I feed it this:

Line A
Line B
Line C

I get those lines back at me. But this:

Line A
Line B
Line C
[* line is present but blank, ie, the file end is: "...B\nLine C\n" *]

(I unfortunately can't have a blank line in SO's little code box thing...) So, first file has three lines ( ["Line A", "Line B", "Line C"] ), second file has four ( ["Line A", "Line B", "Line C", ""] )

This to me seems wrong - I have a four line file, and enumerating it with getline() leaves me with 3. What's really got me scratching my head is that this is exactly what the standard says it should do. (21.3.7.9)

Even Python has similar behaviour (but it gives me the newlines too - C++ chops them off.) Is this some weird thing where C++ is expected lines to be terminated, and not separated by '\n', and I'm feeding it differently?

Edit

Clearly, I need to expand a bit here. I've met up with two philosophies of determining what a "line" in a file is:

  • Lines are terminated by newlines - Dominant in systems such as Linux, and editors like vim. Possible to have a slightly "odd" file by not having a final '\n' (a "noeol" in vim). Impossible to have a blank line at the end of a file.
  • Lines are separated by newlines - Dominant in just about every Windows editor I've ever come across. Every file is valid, and it's possible to have the last line be blank.

Of course, YMMV as to what a newline is.

I've always treated these as two completely different schools of thought. One earlier point I tried to make was to ask if the C++ standard was explicitly or merely implicitly following the first.

Thus, getting back to the question at hand, the second example, which can be thought of as "A\nB\nC\n" has four lines, following the separated philosophy. Now, does C++ explicitly follow a terminated philosophy, or is this just the way the standard is? (They don't record much reasoning in standards...) I'm hesitant to say it was explicit, since it's a bit painful to tell if you have what vim calls a "noeol" file with C++. (Python, for example, leaves the newlines in, so you can tell that way)

Since everything in Windows follows the separated philosophy, I'm looking for something a bit deeper than "Both examples have 3 lines."

(Curiously, where is Mac? terminated or separated?)


The C++ standard has this to say about getline:

C++ 2003, 21.3.7.9/5

[getline(is, str, delim)] … extracts characters from is … until any of the following occurs:

  • end-of-file occurs on the input sequence …
  • c == delim [N.b. default delim is '\n'] for the next available input character c (in which case, c is extracted but not appended)
  • str.max_size() characters are stored

Bracketd editorial comments added

To put it in your vernacular, getline treats '\n' as a terminator, not a separator.


I count only three lines in both your data sets. The first data set is simply missing a line ending character which is present in the second data set.

Your editor represents an empty line after 'Line C' for convenience. If you pipe its contents through wc -l you will find it says 3.


When you say the last line is blank what do you mean? If you mean that the second to last line ends with a carriage return/line feed then you don't technically have a last line, and it sounds like getline() is behaving as I would expect it to.

Consider your example:

Line A
Line B
Line C

This is actually three lines that end in \r\n, and the third line's \r\n is what puts the cursor on the 4th line. There isn't actually a 4th line.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜