开发者

Summing Consecutive Ranges Pythonically

I have a sumranges() function, which sums all the ranges of consecutive numbers found in a tuple of tuples. To illustrate:

def sumranges(nums):
    return sum([sum([1 for j in range(len(nums[i])) if
                     nums[i][j] == 0 or
                     nums[i][j - 1] + 1 != nums[i][j]]) for
                i in range(len(nums))])

>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sumranges(nums)
7

As you can see, it returns the number of ranges of consecutive digits within the tuple, that is: len((1, 2, 3, 4), (1), (5, 6), (19, 20), (24), (29), (400)) = 7. The tuples are always ordered.

My problem is that my sumranges() is terrible. I hate looking at it. I'm currently just iterating through the tuple and each subtuple, assigning a 1 if the number is not (1 + previous number), and summing the total. I feel like I am missing a much easier way to accomplish my stated objective. Does anyone know a more pythonic way to do this?

Edit: I have benchmarked all the answers given thus far. Thanks to all of you for your answers.

The benchmarking code is as follows, using a sample size of 100K:

from time import time
from random import randrange
nums = [sorted(list(set(randrange(1, 10) for i in range(10)))) for
        j in range(100000)]

for func in sumranges, alex, matt, redglyph, ephemient, ferdinand:
    start = time()
    result = func(nums)
    end = time()
    print ', '.join([func.__name__, str(result), str(end - start) + ' s'])

Results are as follows. Actual answer shown to verify that all functions return the correct answer:

sumranges, 250281, 0.54171204567 s
alex, 250281, 0.531121015549 s
matt, 250281, 0.843333005905 s
redglyph, 250281, 0.366822004318 s
ephemient, 250281, 0.805964946747 s
ferdinand, 250281, 0.405596971512 s

RedGlyph does edge out in terms 开发者_JAVA百科of speed, but the simplest answer is probably Ferdinand's, and probably wins for most pythonic.


My 2 cents:

>>> sum(len(set(x - i for i, x in enumerate(t))) for t in nums)
7

It's basically the same idea as descriped in Alex' post, but using a set instead of itertools.groupby, resulting in a shorter expression. Since sets are implemented in C and len() of a set runs in constant time, this should also be pretty fast.


Consider:

>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> flat = [[(x - i) for i, x in enumerate(tu)] for tu in nums]
>>> print flat
[[1, 1, 1, 1], [1, 4, 4], [19, 19, 22, 26, 396]]
>>> import itertools
>>> print sum(1 for tu in flat for _ in itertools.groupby(tu))
7
>>> 

we "flatten" the "increasing ramps" of interest by subtracting the index from the value, turning them into consecutive "runs" of identical values; then we identify and could the "runs" with the precious itertools.groupby. This seems to be a pretty elegant (and speedy) solution to your problem.


Just to show something closer to your original code:

def sumranges(nums):
    return sum( (1 for i in nums
                   for j, v in enumerate(i)
                   if j == 0 or v != i[j-1] + 1) )

The idea here was to:

  • avoid building intermediate lists but use a generator instead, it will save some resources
  • avoid using indices when you already have selected a subelement (i and v above).

The remaining sum() is still necessary with my example though.


Here's my attempt:

def ranges(ls):
    for l in ls:
        consec = False
        for (a,b) in zip(l, l[1:]+(None,)):
            if b == a+1:
                consec = True
            if b is not None and b != a+1:
                consec = False
            if consec:
                yield 1

'''
>>> nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
>>> print sum(ranges(nums))
7
'''

It looks at the numbers pairwise, checking if they are a consecutive pair (unless it's at the last element of the list). Each time there's a consecutive pair of numbers it yields 1.


This could probably be put together in a more compact form, but I think clarity would suffer:

def pairs(seq):
    for i in range(1,len(seq)):
        yield (seq[i-1], seq[i])

def isadjacent(pair):
    return pair[0]+1 == pair[1]

def sumrange(seq):
    return 1 + sum([1 for pair in pairs(seq) if not isadjacent(pair)])

def sumranges(nums):
    return sum([sumrange(seq) for seq in nums])


nums = ((1, 2, 3, 4), (1, 5, 6), (19, 20, 24, 29, 400))
print sumranges(nums)   # prints 7


You could probably do this better if you had an IntervalSet class because then you would scan through your ranges to build your IntervalSet, then just use the count of set members.

Some tasks don't always lend themselves to neat code, particularly if you need to write the code for performance.


There is a formula for this, the sum of the first n numbers, 1+ 2+ ... + n = n(n+1) / 2 . Then if you want to have the sum of i-j then it is (j(j+1)/2) - (i(i+1)/2) this I am sure simplifies but you can work that out. It might not be pythonic but it is what I would use.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜