itertools or hand-written generator - what is preferable?
I have a number of Python generators, which I want to combine into a new generator. I can easily do this by a hand-written generator using a bunch of yield
statements.
On the other hand, the itertools
module is made for things like this and to me it seems as if the pythonic way to create the generator I need is to plug together various iterators of that itertools
module.
However, in the problem at hand, it soon gets quite complicated (the generator needs to maintain a sort of state --- e.g. whether the first or later items are being processed ---, the i-th output further depends on conditions on the 开发者_运维问答i-th input items and the various input lists have to be processed differently before they are being joined to the generated list.
As the composition of standard iterators that would solve my problem is --- due to the one-dimensional nature of writing down source code --- nearly incomprehensible, I wonder whether there are any advantages of using standard itertools
generators versus hand-written generator functions (in basic and in more advanced cases). Actually, I think that in 90% of the cases, the hand-written versions are much easier to read --- probably due to their more imperative style compared to the functional style of chaining iterators.
EDIT
In order to illustrate my problem, here is a (toy) example: Let a
and b
be two iterables of the same length (the input data). The items of a
consist of integers, the items of b
are iterables themselves, whose individual items are strings. The output should correspond to the output of the following generator function:
from itertools import *
def generator(a, b):
first = True
for i, s in izip(a, b):
if first:
yield "First line"
first = False
else:
yield "Some later line"
if i == 0:
yield "The parameter vanishes."
else:
yield "The parameter is:"
yield i
yield "The strings are:"
comma = False
for t in s:
if comma:
yield ','
else:
comma = True
yield t
If I write down the same program in functional style using generator expressions and the
itertools
module, I end up with something like:
from itertools import *
def generator2(a, b):
return (z for i, s, c in izip(a, b, count())
for y in (("First line" if c == 0 else "Some later line",),
("The parameter vanishes.",) if i == 0
else ("The parameter is:", i),
("The strings are:",),
islice((x for t in s for x in (',', t)), 1, None))
for z in y)
EXAMPLE
>>> a = (1, 0, 2), ("ab", "cd", "ef")
>>> print([x for x in generator(a, b)])
['First line', 'The parameter is:', 1, 'The strings are:', 'a', ',', 'b', 'Some later line', 'The parameter vanishes.', 'The strings are:', 'c', ',', 'd', 'Some later line', 'The parameter is:', 2, 'The strings are:', 'e', ',', 'f']
>>> print([x for x in generator2(a, b)])
['First line', 'The parameter is:', 1, 'The strings are:', 'a', ',', 'b', 'Some later line', 'The parameter vanishes.', 'The strings are:', 'c', ',', 'd', 'Some later line', 'The parameter is:', 2, 'The strings are:', 'e', ',', 'f']
This is possibly more elegant than my first solution but it looks like a write-once-do-not-understand-later piece of code. I am wondering whether this way of writing my generator has enough advantages that one should do so.
P.S.: I guess part of my problem with the functional solution is that in order to minimize the amount of keywords in Python, some keywords like "for", "if" and "else" have been recycled for use in expressions so that their placement in the expression takes getting used to (the ordering in the generator expression z for x in a for y in x for z in y
looks, at least to me, less natural than the ordering in the classic for
loop: for x in a: for y in x: for z in y: yield z
).
I did some profiling and the regular generator function is way faster than either your second generator or my implementation.
$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator1(a, b))'
10 loops, best of 3: 169 msec per loop
$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator2(a, b))'
10 loops, best of 3: 489 msec per loop
$ python -mtimeit -s'import gen; a, b = gen.make_test_case()' 'list(gen.generator3(a, b))'
10 loops, best of 3: 385 msec per loop
It also happens to be the most readable so I think i'd go with that. That being said, I'll still post my solution because I think it's a cleaner example of the sort of functional programming you can do with itertools (though clearly still not optimal, I feel like it should be able to smoke the regular generator function. I'll hack on it)
def generator3(parameters, strings):
# replace strings with a generator of generators for the individual charachters
strings = (it.islice((char for string_char in string_ for char in (',', string_char)), 1, None)
for string_ in strings)
# interpolate strings with the notices
strings = (it.chain(('The strings are:',), string_) for string_ in strings)
# nest them in tuples so they're ate the same level as the other generators
separators = it.chain((('First line',),), it.cycle((('Some later line',),)))
# replace the parameters with the appropriate tuples
parameters = (('The parameter is:', p) if p else ('The parameter vanishes.',)
for p in parameters)
# combine the separators, parameters and strings
output = it.izip(separators, parameters, strings)
# flatten it twice and return it
output = it.chain.from_iterable(output)
return it.chain.from_iterable(output)
for reference, the test case is:
def make_test_case():
a = [i % 100 for i in range(10000)]
b = [('12345'*10)[:(i%50)+1] for i in range(10000)]
return a, b
精彩评论