开发者

How to detect line endings across text files from different OS?

In C, I usually read text files one character at a time (e.g. in the loop of a FSM, tokenizing and parsing at the same time). Unfortunately, some operating systems use different methods to mark the end of a line, e.g. Unix ("\n"), Mac OS ("\r") and DOS/Windows ("\r\n").

Hence my question: how do I properly detect line endings across text files from different operating systems?

My current approach is to treat开发者_C百科 '\r' as '\n' and ignore empty lines. Unfortunately, this approach only works as long as empty lines don't change the semantics of the underlying text.

I wouldn't want to "detect" the line ending style for each file, and I certainly don't want solutions based on #ifdef or other kinds of conditional compilation. Are there any valid solutions left?


I normally don't recommend reading a file one char at a time but for your case I would suggest you "peek" ahead one char use the following logic...

if c == '\r'
    p = peek
    if p == '\n'
        read next c

You can't really trust that all files are of a certain affinity or even that a file follows the same convention throughout itself, thus you should code for all cases. In this case if you see \r you might see a \n and if you do consume the next char and move on.


Unfortunately, a file can have mixed line endings if it's been passed around, or edited with editors that allow you to specify the line ending, or for any number of other similar reasons. Determining "the" line ending style for a file could be a matter of taking a vote -- the most lines that end in style X wins.

What I've done is

  1. treat \r as a newline. if the next char is \n discard it. (if the next char is not \n the \r still counts as a newline)

  2. treat \n as a newline, unless you're throwing it away becuase of (1)


My usual approach is to treat '\n' as the line terminator, and if the previous character was '\r', remove it (usually I end up overwriting either one or the other with 0). If you also want to support legacy Mac text files though ('\r'-only newlines) then you can take the approach of treating either lone '\r', lone '\n', or the pair "\r\n" as a line break.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜