How can I detect DOS line breaks in a file?

2022-12-29 13:12 问答作者：

I have a bunch of files. Some are Unix line endings, many are DOS. I'd like to test each file to see if if is dos formatted, before I switch t开发者_运维知识库he line endings.

How would I do this? Is there a flag I can test for? Something similar?

Python can automatically detect what newline convention is used in a file, thanks to the "universal newline mode" (U), and you can access Python's guess through the newlines attribute of file objects:

f = open('myfile.txt', 'U')
f.readline()  # Reads a line
# The following now contains the newline ending of the first line:
# It can be "\r\n" (Windows), "\n" (Unix), "\r" (Mac OS pre-OS X).
# If no newline is found, it contains None.
print repr(f.newlines)

This gives the newline ending of the first line (Unix, DOS, etc.), if any.

As John M. pointed out, if by any chance you have a pathological file that uses more than one newline coding, f.newlines is a tuple with all the newline codings found so far, after reading many lines.

Reference: http://docs.python.org/2/library/functions.html#open

If you just want to convert a file, you can simply do:

with open('myfile.txt', 'U') as infile:
    text = infile.read()  # Automatic ("Universal read") conversion of newlines to "\n"
with open('myfile.txt', 'w') as outfile:
    outfile.write(text)  # Writes newlines for the platform running the program

You could search the string for \r\n. That's DOS style line ending.

EDIT: Take a look at this

(Python 2 only:) If you just want to read text files, either DOS or Unix-formatted, this works:

print open('myfile.txt', 'U').read()

That is, Python's "universal" file reader will automatically use all the different end of line markers, translating them to "\n".

http://docs.python.org/library/functions.html#open

(Thanks handle!)

As a complete Python newbie & just for fun, I tried to find some minimalistic way of checking this for one file. This seems to work:

if "\r\n" in open("/path/file.txt","rb").read():
    print "DOS line endings found"

Edit: simplified as per John Machin's comment (no need to use regular expressions).

dos linebreaks are \r\n, unix only \n. So just search for \r\n.

Using grep & bash:

grep -c -m 1 $'\r$' file

echo $'\r\n\r\n' | grep -c $'\r$'     # test

echo $'\r\n\r\n' | grep -c -m 1 $'\r$'

You can use the following function (which should work in Python 2 and Python 3) to get the newline representation used in an existing text file. All three possible kinds are recognized. The function reads the file only up to the first newline to decide. This is faster and less memory consuming when you have larger text files, but it does not detect mixed newline endings.

In Python 3, you can then pass the output of this function to the newline parameter of the open function when writing the file. This way you can alter the context of a text file without changing its newline representation.

def get_newline(filename):
    with open(filename, "rb") as f:
        while True:
            c = f.read(1)
            if not c or c == b'\n':
                break
            if c == b'\r':
                if f.read(1) == b'\n':
                    return '\r\n'
                return '\r'
    return '\n'

继续阅读：bash file line-breaks line-endings python

How can I detect DOS line breaks in a file?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？