Python generator that groups another iterable into groups of N [duplicate]

2023-01-20 16:41 问答作者：

This question already has answers here: Iterate an iterator by chunks (of n) in Python? 开发者_JAVA百科 (13 answers) Closed 4 months ago.

I'm looking for a function that takes an iterable i and a size n and yields tuples of length n that are sequential values from i:

x = [1,2,3,4,5,6,7,8,9,0]
[z for z in TheFunc(x,3)]

gives

[(1,2,3),(4,5,6),(7,8,9),(0)]

Does such a function exist in the standard library?

If it exists as part of the standard library, I can't seem to find it and I've run out of terms to search for. I could write my own, but I'd rather not.

When you want to group an iterator in chunks of n without padding the final group with a fill value, use iter(lambda: list(IT.islice(iterable, n)), []):

import itertools as IT

def grouper(n, iterable):
    """
    >>> list(grouper(3, 'ABCDEFG'))
    [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
    """
    iterable = iter(iterable)
    return iter(lambda: list(IT.islice(iterable, n)), [])

seq = [1,2,3,4,5,6,7]
print(list(grouper(3, seq)))

yields

[[1, 2, 3], [4, 5, 6], [7]]

There is an explanation of how it works in the second half of this answer.

When you want to group an iterator in chunks of n and pad the final group with a fill value, use the grouper recipe zip_longest(*[iterator]*n):

For example, in Python2:

>>> list(IT.izip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

In Python3, what was izip_longest is now renamed zip_longest:

>>> list(IT.zip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

When you want to group a sequence in chunks of n you can use the chunks recipe:

def chunks(seq, n):
    # https://stackoverflow.com/a/312464/190597 (Ned Batchelder)
    """ Yield successive n-sized chunks from seq."""
    for i in xrange(0, len(seq), n):
        yield seq[i:i + n]

Note that, unlike iterators in general, sequences by definition have a length (i.e. __len__ is defined).

See the grouper recipe in the docs for the itertools package

def grouper(n, iterable, fillvalue=None):
  "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
  args = [iter(iterable)] * n
  return izip_longest(fillvalue=fillvalue, *args)

(However, this is a duplicate of quite a few questions.)

How about this one? It doesn't have a fill value though.

>>> def partition(itr, n):
...     i = iter(itr)
...     res = None
...     while True:
...             res = list(itertools.islice(i, 0, n))
...             if res == []:
...                     break
...             yield res
...
>>> list(partition([1, 2, 3, 4, 5, 6, 7, 8, 9], 3))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>

It utilizes a copy of the original iterable, which it exhausts for each successive splice. The only other way my tired brain could come up with was generating splice end-points with range.

Maybe I should change list() to tuple() so it better corresponds to your output.

This is a very common request in Python. Common enough that it made it into the boltons unified utility package. First off, there are extensive docs here. Furthermore, the module is designed and tested to only rely on the standard library (Python 2 and 3 compatible), meaning you can just download the file directly into your project.

# if you downloaded/embedded, try:
# from iterutils import chunked

# with `pip install boltons` use:

from boltons.iterutils import chunked 

print(chunked(range(10), 3))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

There's an iterator/generator form for indefinite/long sequences as well:

print(list(chunked_iter(range(10), 3, fill=None)))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, None, None]]

As you can see, you can also fill the sequence with a value of your choosing, as well. Finally, as the maintainer, I can assure you that, while the code has been downloaded/tested by thousands of developers, if you encounter any issues, you'll get the fastest support possible through the boltons GitHub Issues page. Hope this (and/or any of the other 150+ boltons recipes) helped!

I use the chunked function from the more_itertools package.

$ pip install more_itertools
$ python
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> [tuple(z) for z in more_itertools.more.chunked(x, 3)]
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (0,)]

This is a very old quesiton, but I think it is useful to mention the following approach for the general case. Its main merit is that it only needs to iterate over the data once, so it will work with database cursors or other sequences that can only be used once. I also find it more readable.

def chunks(n, iterator):
    out = []
    for elem in iterator:
        out.append(elem)
        if len(out) == n:
            yield out
            out = []
    if out:
        yield out

I know this has been answered several times but I'm adding my solution which should improve in both, general applicability to sequences and iterators, readability (no invisible loop exit condition by StopIteration exception) and performance when compared to the grouper recipe. It is most similar to the last answer by Svein.

def chunkify(iterable, n):
    iterable = iter(iterable)
    n_rest = n - 1

    for item in iterable:
        rest = itertools.islice(iterable, n_rest)
        yield itertools.chain((item,), rest)

Here is a different solution which makes no use of itertools and, even though it has a couple more lines, it apparently outperforms the given answers when chunks are a lot shorter than the iterable lenght. However, for big chunks the other answers are much faster.

def batchiter(iterable, batch_size):
    """
    >>> list(batchiter('ABCDEFG', 3))
    [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
    """
    next_batch = []
    for element in iterable:
        next_batch.append(element)
        if len(next_batch) == batch_size:
            batch, next_batch = next_batch, []
            yield batch
    if next_batch:
        yield next_batch


In [19]: %timeit [b for b in batchiter(range(1000), 3)]
1000 loops, best of 3: 644 µs per loop

In [20]: %timeit [b for b in grouper(3, range(1000))]
1000 loops, best of 3: 897 µs per loop

In [21]: %timeit [b for b in partition(range(1000), 3)]
1000 loops, best of 3: 890 µs per loop

In [22]: %timeit [b for b in batchiter(range(1000), 333)]
1000 loops, best of 3: 540 µs per loop

In [23]: %timeit [b for b in grouper(333, range(1000))]
10000 loops, best of 3: 81.7 µs per loop

In [24]: %timeit [b for b in partition(range(1000), 333)]
10000 loops, best of 3: 80.1 µs per loop

    def grouper(iterable, n):
        while True:
            yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))

继续阅读：generator python std

Python generator that groups another iterable into groups of N [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？