Summing Consecutive Ranges Pythonically

2022-12-10 07:40 问答作者：

I have a sumranges() function, which sums all the ranges of consecutive numbers found in a tuple of tuples. To illustrate:

def sumranges(nums):
    return sum([sum([1 for j in range(len(nums[i])) if
                     nums[i][j] == 0 or
                     nums[i][j - 1] + 1 != nums[i][j]]) for
                i in range(len(nums))])

>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sumranges(nums)
7

As you can see, it returns the number of ranges of consecutive digits within the tuple, that is: len((1, 2, 3, 4), (1), (5, 6), (19, 20), (24), (29), (400)) = 7. The tuples are always ordered.

My problem is that my sumranges() is terrible. I hate looking at it. I'm currently just iterating through the tuple and each subtuple, assigning a 1 if the number is not (1 + previous number), and summing the total. I feel like I am missing a much easier way to accomplish my stated objective. Does anyone know a more pythonic way to do this?

Edit: I have benchmarked all the answers given thus far. Thanks to all of you for your answers.

The benchmarking code is as follows, using a sample size of 100K:

from time import time
from random import randrange
nums = [sorted(list(set(randrange(1, 10) for i in range(10)))) for
        j in range(100000)]

for func in sumranges, alex, matt, redglyph, ephemient, ferdinand:
    start = time()
    result = func(nums)
    end = time()
    print ', '.join([func.__name__, str(result), str(end - start) + ' s'])

Results are as follows. Actual answer shown to verify that all functions return the correct answer:

sumranges, 250281, 0.54171204567 s
alex, 250281, 0.531121015549 s
matt, 250281, 0.843333005905 s
redglyph, 250281, 0.366822004318 s
ephemient, 250281, 0.805964946747 s
ferdinand, 250281, 0.405596971512 s

RedGlyph does edge out in terms 开发者_JAVA百科of speed, but the simplest answer is probably Ferdinand's, and probably wins for most pythonic.

My 2 cents:

>>> sum(len(set(x - i for i, x in enumerate(t))) for t in nums)
7

It's basically the same idea as descriped in Alex' post, but using a set instead of itertools.groupby, resulting in a shorter expression. Since sets are implemented in C and len() of a set runs in constant time, this should also be pretty fast.

Consider:

>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> flat = [[(x - i) for i, x in enumerate(tu)] for tu in nums]
>>> print flat
[[1, 1, 1, 1], [1, 4, 4], [19, 19, 22, 26, 396]]
>>> import itertools
>>> print sum(1 for tu in flat for _ in itertools.groupby(tu))
7
>>>

we "flatten" the "increasing ramps" of interest by subtracting the index from the value, turning them into consecutive "runs" of identical values; then we identify and could the "runs" with the precious itertools.groupby. This seems to be a pretty elegant (and speedy) solution to your problem.

Just to show something closer to your original code:

def sumranges(nums):
    return sum( (1 for i in nums
                   for j, v in enumerate(i)
                   if j == 0 or v != i[j-1] + 1) )

The idea here was to:

avoid building intermediate lists but use a generator instead, it will save some resources
avoid using indices when you already have selected a subelement (i and v above).

The remaining sum() is still necessary with my example though.

Here's my attempt:

def ranges(ls):
    for l in ls:
        consec = False
        for (a,b) in zip(l, l[1:]+(None,)):
            if b == a+1:
                consec = True
            if b is not None and b != a+1:
                consec = False
            if consec:
                yield 1

'''
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sum(ranges(nums))
7
'''

It looks at the numbers pairwise, checking if they are a consecutive pair (unless it's at the last element of the list). Each time there's a consecutive pair of numbers it yields 1.

This could probably be put together in a more compact form, but I think clarity would suffer:

def pairs(seq):
    for i in range(1,len(seq)):
        yield (seq[i-1], seq[i])

def isadjacent(pair):
    return pair[0]+1 == pair[1]

def sumrange(seq):
    return 1 + sum([1 for pair in pairs(seq) if not isadjacent(pair)])

def sumranges(nums):
    return sum([sumrange(seq) for seq in nums])


nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
print sumranges(nums)   # prints 7

You could probably do this better if you had an IntervalSet class because then you would scan through your ranges to build your IntervalSet, then just use the count of set members.

Some tasks don't always lend themselves to neat code, particularly if you need to write the code for performance.

There is a formula for this, the sum of the first n numbers, 1+ 2+ ... + n = n(n+1) / 2 . Then if you want to have the sum of i-j then it is (j(j+1)/2) - (i(i+1)/2) this I am sure simplifies but you can work that out. It might not be pythonic but it is what I would use.

继续阅读：python

Summing Consecutive Ranges Pythonically

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？