Why does takewhile() skip the first line?
I have a fil开发者_开发知识库e like this:
1
2
3
TAB
1
2
3
TAB
I want to read the lines between TAB as blocks.
import itertools
def block_generator(file):
with open(file) as lines:
for line in lines:
block = list(itertools.takewhile(lambda x: x.rstrip('\n') != '\t',
lines))
yield block
I want to use it as such:
blocks = block_generator(myfile)
for block in blocks:
do_something(block)
The blocks i get all start with the second line like [2,3] [2,3]
, why?
Here is another approach using groupby
from itertools import groupby
def block_generator(filename):
with open(filename) as lines:
for pred,block in groupby(lines, "\t\n".__ne__):
if pred:
yield block
Here you go, tested code. Uses while True:
to loop, and lets itertools.takewhile()
do everything with lines
. When itertools.takewhile()
reaches the end of input, it returns an iterator that does nothing except raise StopIteration
, which list()
simply turns into an empty list, so a simple if not block:
test detects the empty list and breaks out of the loop.
import itertools
def not_tabline(line):
return '\t' != line.rstrip('\n')
def block_generator(file):
with open(file) as lines:
while True:
block = list(itertools.takewhile(not_tabline, lines))
if not block:
break
yield block
for block in block_generator("test.txt"):
print "BLOCK:"
print block
As noted in a comment below, this has one flaw: if the input text has two lines in a row with just the tab character, this loop will stop processing without reading all the input text. And I cannot think of any way to handle this cleanly; it's really unfortunate that the iterator you get back from itertools.takewhile()
uses StopIteration
both as the marker for the end of a group and as what you get at end-of-file. To make it worse, I cannot find any way to ask a file iterator object whether it has reached end-of-file or not. And to make it even worse, itertools.takewhile()
seems to advance the file iterator to end-of-file instantly; when I tried to rewrite the above to check on our progress using lines.tell()
it was already at end-of-file after the first group.
I suggest using the itertools.groupby()
solution. It's cleaner.
I think the problem is that you are taking lines
in your lambda function rather than line
. What is your expected output?
itertools.takewhile
implicitly iterates over the lines
of the file in order to grab chunks, but so does for line in lines:
. Each time through the loop, a line
is grabbed, thrown away (since there is no code that uses line
), and then some more are block
ed together.
精彩评论