How to clone a Python generator object?
Consider this scenario:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
walk = os.walk('/home')
for root, dirs, files in walk:
for pathname i开发者_StackOverflown dirs+files:
print os.path.join(root, pathname)
for root, dirs, files in walk:
for pathname in dirs+files:
print os.path.join(root, pathname)
I know that this example is kinda redundant, but you should consider that we need to use the same walk
data more than once. I've a benchmark scenario and the use of same walk
data is mandatory to get helpful results.
I've tried walk2 = walk
to clone and use in the second iteration, but it didn't work. The question is... How can I copy it? Is it ever possible?
Thank you in advance.
You can use itertools.tee()
:
walk, walk2 = itertools.tee(walk)
Note that this might "need significant extra storage", as the documentation points out.
If you know you are going to iterate through the whole generator for every usage, you will probably get the best performance by unrolling the generator to a list and using the list multiple times.
walk = list(os.walk('/home'))
Define a function
def walk_home():
for r in os.walk('/home'):
yield r
Or even this
def walk_home():
return os.walk('/home')
Both are used like this:
for root, dirs, files in walk_home():
for pathname in dirs+files:
print os.path.join(root, pathname)
This is a good usecase for functools.partial()
to make a quick generator-factory:
from functools import partial
import os
walk_factory = partial(os.walk, '/home')
walk1, walk2, walk3 = walk_factory(), walk_factory(), walk_factory()
What functools.partial()
does is hard to describe with human-words, but this^ is what it's for.
It partially fills out function-params without executing that function. Consequently it acts as a function/generator factory.
This answer aims to extend/elaborate on what the other answers have expressed. The solution will necessarily vary depending on what exactly you aim to achieve.
If you want to iterate over the exact same result of os.walk
multiple times, you will need to initialize a list from the os.walk
iterable's items (i.e. walk = list(os.walk(path))
).
If you must guarantee the data remains the same, that is probably your only option. However, there are several scenarios in which this is not possible or desirable.
- It will not be possible to
list()
an iterable if the output is of sufficient size (i.e. attempting tolist()
an entire filesystem may freeze your computer). - It is not desirable to
list()
an iterable if you wish to acquire "fresh" data prior to each use.
In the event that list()
is not suitable, you will need to run your generator on demand. Note that generators are extinguised after each use, so this poses a slight problem. In order to "rerun" your generator multiple times, you can use the following pattern:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
class WalkMaker:
def __init__(self, path):
self.path = path
def __iter__(self):
for root, dirs, files in os.walk(self.path):
for pathname in dirs + files:
yield os.path.join(root, pathname)
walk = WalkMaker('/home')
for path in walk:
pass
# do something...
for path in walk:
pass
The aforementioned design pattern will allow you to keep your code DRY.
This "Python Generator Listeners" code allows you to have many listeners on a single generator, like os.walk
, and even have someone "chime in" later.
def walkme(): os.walk('/home')
m1 = Muxer(walkme) m2 = Muxer(walkme)
then m1 and m2 can run in threads even and process at their leisure.
See: https://gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3
import queue
from threading import Lock
from collections import namedtuple
class Muxer():
Entry = namedtuple('Entry', 'genref listeners, lock')
already = {}
top_lock = Lock()
def __init__(self, func, restart=False):
self.restart = restart
self.func = func
self.queue = queue.Queue()
with self.top_lock:
if func not in self.already:
self.already[func] = self.Entry([func()], [], Lock())
ent = self.already[func]
self.genref = ent.genref
self.lock = ent.lock
self.listeners = ent.listeners
self.listeners.append(self)
def __iter__(self):
return self
def __next__(self):
try:
e = self.queue.get_nowait()
except queue.Empty:
with self.lock:
try:
e = self.queue.get_nowait()
except queue.Empty:
try:
e = next(self.genref[0])
for other in self.listeners:
if not other is self:
other.queue.put(e)
except StopIteration:
if self.restart:
self.genref[0] = self.func()
raise
return e
def __del__(self):
with self.top_lock:
try:
self.listeners.remove(self)
except ValueError:
pass
if not self.listeners and self.func in self.already:
del self.already[self.func]
精彩评论