Why does takewhile() skip the first line?

2023-04-02 02:15 问答作者：

1
2
3
TAB
1
2
3
TAB

I want to read the lines between TAB as blocks.

import itertools

def block_generator(file):
    with open(file) as lines:
        for line in lines:
            block = list(itertools.takewhile(lambda x: x.rstrip('\n') != '\t',
                                             lines))
            yield block

I want to use it as such:

blocks = block_generator(myfile)
for block in blocks:
    do_something(block)

The blocks i get all start with the second line like [2,3] [2,3], why?

Here is another approach using groupby

from itertools import groupby
def block_generator(filename):
    with open(filename) as lines:
        for pred,block in groupby(lines, "\t\n".__ne__):
            if pred:
                yield block

Here you go, tested code. Uses while True: to loop, and lets itertools.takewhile() do everything with lines. When itertools.takewhile() reaches the end of input, it returns an iterator that does nothing except raise StopIteration, which list() simply turns into an empty list, so a simple if not block: test detects the empty list and breaks out of the loop.

import itertools

def not_tabline(line):
    return '\t' != line.rstrip('\n')

def block_generator(file):
    with open(file) as lines:
        while True:
            block = list(itertools.takewhile(not_tabline, lines))
            if not block:
                break
            yield block

for block in block_generator("test.txt"):
    print "BLOCK:"
    print block

As noted in a comment below, this has one flaw: if the input text has two lines in a row with just the tab character, this loop will stop processing without reading all the input text. And I cannot think of any way to handle this cleanly; it's really unfortunate that the iterator you get back from itertools.takewhile() uses StopIteration both as the marker for the end of a group and as what you get at end-of-file. To make it worse, I cannot find any way to ask a file iterator object whether it has reached end-of-file or not. And to make it even worse, itertools.takewhile() seems to advance the file iterator to end-of-file instantly; when I tried to rewrite the above to check on our progress using lines.tell() it was already at end-of-file after the first group.

I suggest using the itertools.groupby() solution. It's cleaner.

I think the problem is that you are taking lines in your lambda function rather than line. What is your expected output?

itertools.takewhile implicitly iterates over the lines of the file in order to grab chunks, but so does for line in lines:. Each time through the loop, a line is grabbed, thrown away (since there is no code that uses line), and then some more are blocked together.

继续阅读：generator io python

Why does takewhile() skip the first line?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？