Specifying chars in python

2022-12-14 03:09 问答作者：

I need a functions that iterates over all the lines in the file.

Here's what I have so far:

def LineFeed(file):
    ret = ""
    for byte in file:
        ret = ret + str(byte)
        if str(byte) == '\r':
            yield ret
            ret = ""

All the lines in th开发者_运维百科e file end with \r (not \n), and I'm reading it in "rb" mode, (I have to read this file in binary). The yield doesn't work and returns nothing. Maybe there's a problem with the comparison? I'm just not sure how you represent a byte/char in python.

I'm getting the idea that if you for loop on a "rb" file it still tries to iterate over lines not bytes..., How can I iterate over bytes? My problem is that I don't have standard line endings. Also my file is filled with 0x00 bytes and I would like to get rid of them all, so I think I would need a second yield function, how could I implement that, I just don't know how to represent the 0x00 byte in python or the NULL char.

I think that you are confused with what "for x in file" does. Assuming you got your handle like "file = open(file_name)", byte in this case will be an entire line, not a single character. So you are only calling yield when the entire line consists of a single carriage return. Try changing "byte" to "line" and iterating over that with a second loop.

Perhaps if you were to explain what this file represents, why it has lots of '\x00', why you think you need to read it in binary mode, we could help you with your underlying problem.

Otherwise, try the following code; it avoids any dependence on (or interference from) your operating system's line-ending convention.

lines = open("the_file", "rb").read().split("\r")
for line in lines:
    process(line)

Edit: the ASCII NUL (not "NULL") byte is "\x00".

If you're in control of how you open the file, I'd recommend opening it with universal newlines, since \r isn't recognized as a linefeed character if you just use 'rb' mode, but it is if you use 'Urb'.

This will only work if you aren't including \n as well as \r in your binary file somewhere, since the distinction between \r and \n is lost when using universal newlines.

Assuming you want your yielded lines to still be \r terminated:

NUL = '\x00'
def lines_without_nulls(path):
    with open(path, 'Urb') as f:
        for line in f:
            yield line.replace(NUL, '').replace('\n', '\r')

So, your problem is iterating over the lines of a file open in binary mode that use '\r' as a line separator. Since the file is in binary mode, you cannot use the universal newline feature, and it turns out that '\r' is not interpreted as a line separator in binary mode.

Reading a file char by char is a terribly inefficient thing to do in Python, but here's how you could iterate over your lines:

def cr_lines(the_file):
    line = []
    while True:
        byte = the_file.read(1)
        if not byte:
            break
        line.append(byte)
        if byte == '\r':
            yield ''.join(line)
            line = []
    if line:
        yield ''.join(line)

To be more efficient, you would need to read bigger chunks of text and handle buffering in your iterator. Keeping in mind that you could get strange bugs if seeking while iterating. Preventing those bugs would require a subclass of file so you can purge the buffer on seek.

Note the use of the ''.join(line) idiom. Accumulating a string with += has terrible performance and is common mistake made by beginning programmers.

Edit:

string1 += string2 string concatenation is slow. Try joining a list of strings.
ddaa is right--You shouldn't need the struct package if the binary file only contains ASCII. Also, my generator returns the string after the final '\r', before EOF. With these two minor fixes, my code is suspiciously similar (practically identical) to this more recent answer.

Code snip:

def LineFeed(f):
    ret = []
    while True:
        oneByte = f.read(1)
        if not oneByte: break
        # Return everything up to, but not including the carriage return
        if oneByte == '\r':
            yield ''.join(ret)
            ret = []
        else:
            ret.append(oneByte)
    if oneByte:
        yield ''.join(ret)
if __name__ == '__main__':
    lf = LineFeed( open('filename','rb') )

    for something in lf:
        doSomething(something)

继续阅读：binaryfiles python

Specifying chars in python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？