Do all iterators cache? How about csv.Reader?
We know the following code is loading the data line-by-line only rather than loading them all in memory. i.e. the line alread read will be somehow marked 'deletable' for the OS
def fileGen( file ):
for line in file:
yield line
with open("somefile") as file:
for line in fileGen( file ):
print line
but is there anyway we could verify if this is still true if we modify the definition of fileGen to following?
def fileGen( file ):
for line in csv.Reader( file ):
yield line
How we could know if c开发者_StackOverflowsv.Reader will cache the data it loaded? thanks
regards, John
The most reliable way to find out what csv.reader
is doing is to read the source. See _csv.c
, lines 773 onwards. You'll see that the reader object has a pointer to the underlying iterator (typically a file iterator), and it calls PyIter_Next
each time it needs another line. So it does not read ahead or otherwise cache the data it loads.
Another way to find out what csv.reader
is doing is to make a mock file object that can report when it is being queried. For example:
class MockFile:
def __init__(self): self.line = 0
def __iter__(self): return self
def next(self):
self.line += 1
print "MockFile line", self.line
return "line,{0}".format(self.line)
>>> r = csv.reader(MockFile())
>>> next(r)
MockFile line 1
['line', '1']
>>> next(r)
MockFile line 2
['line', '2']
This confirms what we learned from reading the csv
source code: it only requests the next line from the underlying iterator when its own next
method is called.
John made it clear (see comments) that his concern is whether csv.reader
keeps the lines alive, preventing them from being collected by Python's memory manager.
Again, you can either read the code (most reliable) or try an experiment. If you look at the implementation of Reader_iternext
in _csv.c
, you'll see that lineobj
is the name given to the object returned by the underlying iterator, and there's a call to Py_DECREF(lineobj)
on every path through the code. So csv.reader
does not keep lineobj
alive.
Here's an experiment to confirm that.
class FinalizableString(string):
"""A string that reports its deletion."""
def __init__(self, s): self.s = s
def __str__(self): return self.s
def __del__(self): print "*** Deleting", self.s
class MockFile:
def __init__(self): self.line = 0
def __iter__(self): return self
def next(self):
self.line += 1
return FinalizableString("line,{0}".format(self.line))
>>> r = csv.reader(MockFile())
>>> next(r)
*** Deleting line,1
['line', '1']
>>> next(r)
*** Deleting line,2
['line', '2']
So you can see that csv.reader
does not hang on to the objects it gets from its iterator, and if nothing else is keeping them alive, then they get garbage-collected in a timely fashion.
I have a feeling that there's something more to this question that you're not telling us. Can you explain why you are worried about this?
精彩评论