Python generator that groups another iterable into groups of N [duplicate]
I'm looking for a function that takes an iterable i
and a size n
and yields tuples of length n
that are sequential values from i
:
x = [1,2,3,4,5,6,7,8,9,0]
[z for z in TheFunc(x,3)]
gives
[(1,2,3),(4,5,6),(7,8,9),(0)]
Does such a function exist in the standard library?
If it exists as part of the standard library, I can't seem to find it and I've run out of terms to search for. I could write my own, but I'd rather not.
When you want to group an iterator in chunks of n
without padding the final group with a fill value, use iter(lambda: list(IT.islice(iterable, n)), [])
:
import itertools as IT
def grouper(n, iterable):
"""
>>> list(grouper(3, 'ABCDEFG'))
[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
"""
iterable = iter(iterable)
return iter(lambda: list(IT.islice(iterable, n)), [])
seq = [1,2,3,4,5,6,7]
print(list(grouper(3, seq)))
yields
[[1, 2, 3], [4, 5, 6], [7]]
There is an explanation of how it works in the second half of this answer.
When you want to group an iterator in chunks of n
and pad the final group with a fill value, use the grouper recipe zip_longest(*[iterator]*n)
:
For example, in Python2:
>>> list(IT.izip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]
In Python3, what was izip_longest
is now renamed zip_longest
:
>>> list(IT.zip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]
When you want to group a sequence in chunks of n
you can use the chunks
recipe:
def chunks(seq, n):
# https://stackoverflow.com/a/312464/190597 (Ned Batchelder)
""" Yield successive n-sized chunks from seq."""
for i in xrange(0, len(seq), n):
yield seq[i:i + n]
Note that, unlike iterators in general, sequences by definition have a length (i.e. __len__
is defined).
See the grouper
recipe in the docs for the itertools
package
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
(However, this is a duplicate of quite a few questions.)
How about this one? It doesn't have a fill value though.
>>> def partition(itr, n):
... i = iter(itr)
... res = None
... while True:
... res = list(itertools.islice(i, 0, n))
... if res == []:
... break
... yield res
...
>>> list(partition([1, 2, 3, 4, 5, 6, 7, 8, 9], 3))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>
It utilizes a copy of the original iterable, which it exhausts for each successive splice. The only other way my tired brain could come up with was generating splice end-points with range.
Maybe I should change list()
to tuple()
so it better corresponds to your output.
This is a very common request in Python. Common enough that it made it into the boltons unified utility package. First off, there are extensive docs here. Furthermore, the module is designed and tested to only rely on the standard library (Python 2 and 3 compatible), meaning you can just download the file directly into your project.
# if you downloaded/embedded, try:
# from iterutils import chunked
# with `pip install boltons` use:
from boltons.iterutils import chunked
print(chunked(range(10), 3))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
There's an iterator/generator form for indefinite/long sequences as well:
print(list(chunked_iter(range(10), 3, fill=None)))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, None, None]]
As you can see, you can also fill the sequence with a value of your choosing, as well. Finally, as the maintainer, I can assure you that, while the code has been downloaded/tested by thousands of developers, if you encounter any issues, you'll get the fastest support possible through the boltons GitHub Issues page. Hope this (and/or any of the other 150+ boltons recipes) helped!
I use the chunked function from the more_itertools package.
$ pip install more_itertools
$ python
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> [tuple(z) for z in more_itertools.more.chunked(x, 3)]
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (0,)]
This is a very old quesiton, but I think it is useful to mention the following approach for the general case. Its main merit is that it only needs to iterate over the data once, so it will work with database cursors or other sequences that can only be used once. I also find it more readable.
def chunks(n, iterator):
out = []
for elem in iterator:
out.append(elem)
if len(out) == n:
yield out
out = []
if out:
yield out
I know this has been answered several times but I'm adding my solution which should improve in both, general applicability to sequences and iterators, readability (no invisible loop exit condition by StopIteration exception) and performance when compared to the grouper recipe. It is most similar to the last answer by Svein.
def chunkify(iterable, n):
iterable = iter(iterable)
n_rest = n - 1
for item in iterable:
rest = itertools.islice(iterable, n_rest)
yield itertools.chain((item,), rest)
Here is a different solution which makes no use of itertools and, even though it has a couple more lines, it apparently outperforms the given answers when chunks are a lot shorter than the iterable lenght. However, for big chunks the other answers are much faster.
def batchiter(iterable, batch_size):
"""
>>> list(batchiter('ABCDEFG', 3))
[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
"""
next_batch = []
for element in iterable:
next_batch.append(element)
if len(next_batch) == batch_size:
batch, next_batch = next_batch, []
yield batch
if next_batch:
yield next_batch
In [19]: %timeit [b for b in batchiter(range(1000), 3)]
1000 loops, best of 3: 644 µs per loop
In [20]: %timeit [b for b in grouper(3, range(1000))]
1000 loops, best of 3: 897 µs per loop
In [21]: %timeit [b for b in partition(range(1000), 3)]
1000 loops, best of 3: 890 µs per loop
In [22]: %timeit [b for b in batchiter(range(1000), 333)]
1000 loops, best of 3: 540 µs per loop
In [23]: %timeit [b for b in grouper(333, range(1000))]
10000 loops, best of 3: 81.7 µs per loop
In [24]: %timeit [b for b in partition(range(1000), 333)]
10000 loops, best of 3: 80.1 µs per loop
def grouper(iterable, n):
while True:
yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))
精彩评论