Is there such a thing as "too many yield statements" in python?
If doing a directory listing and reading the files within, at what point does the performance of yield start to deteriorate, compared to returning a list of all the files in the开发者_如何转开发 directory?
Here I'm assuming one has enough RAM to return the (potentially huge) list.
PS I'm having problems inlining code in a comment, so I'll put some examples in here.
def list_dirs_list():
# list version
return glob.glob(/some/path/*)
def list_dirs_iter():
# iterator version
return glob.iglob(/some/path/*)
Behind the scenes both calls to glob use os.listdir so it would seem they are equivalent performance-wise. But this Python doc seems to imply glob.iglob is faster.
There is no point at which further use of yield
results in decreased performance. In fact, as compared to assembling things in a list, yield
actually improves by comparison the more elements there are.
It depends on how you're doing the directory listing. Most mechanisms in Python pull the entire directory listing into a list; if doing it that way then even a single yield is a waste. If using opendir(3)
then it's probably a random number, according to XKCD's definition of "random".
using yield is functionally similar to writing a functor class, even from an implementation or performance perspective, except that it can probably actually call the generator a little bit quicker than the __call__
method on a self-made class, because that is built in to the generator's C implementation.
To hammer this home, the use and rough implementation of the following is the same:
def generator_counter():
i = 0
while True:
i += 1
yield i
class functor_counter():
def __init__(self):
self.i = 0
def __call__(self):
i += 1
return i
In Python 2.7, the definition of glob
is
def glob(pathname): return list(iglob(pathname))
So at least for this version, glob
can never be faster than iglob
.
精彩评论