开发者

Stopping a Reduce() operation mid way. Functional way of doing partial running sum

I have been doing some functional programming and had a question. Perhaps I might be missing something but is there any way to stop a "reduce()" function midway? Lets say when I reach a certain condition? The idea somehow seems anti functional. I haven't seen any such option in python or F#,

As an example, lets say I have a list such as [1,2,3,4,5]. I want to sum the elements in this list until the sum is not greater than some number (lets say 8), and return/mark/store/identify somehow, the numbe开发者_如何学编程r of elements I have actually added.

If we looked at python for example for I might try something like

reduce(lambda a,b : a if a + b > 8 else a + b, input)

This gives me the right answer 6, but how do I find that I had added 3 elements to get here. There is no counter as such. I can't do assignments inside lambdas. I think F# has the same situation.

I know I can use a for loop or use a function that can store state etc. But what would be the functional way of doing/thinking about this. Reduce() wants to run until the end, but somewhere along this line of processing, we either want to stop it (because we don't care about processing the rest of the elements) or at least make a note of the place where we stopped caring.


Reduce is often used in combination with map. Google for example has developed a map-reduce framework for querying their databases and this map-reduce pattern is now used in several other projects (e.g. CouchDB, Hadoop, etc).

First, you need to map the input variables [2, 1, 3, 4, 5] to something like:

[(1, 2), (1, 1), (1, 3), (1, 4), (1, 5)]

In that case, x[0] will represent the number of the elements to get the sum x[1]. Of course, the number of elements is 1 at the beginning for each single element.

The next thing then, is to operate on those tuples:

reduce(
    lambda a, b: a if a[1] + b[1] > 8 else (a[0] + b[0], a[1] + b[1]),
    map(lambda x: (1, x), input))

This will return (3, 6), meaning the partial sum is 6 using 3 elements.

I hope you got the idea behind map-reduce-algorithms.

Regards,
Christoph


I agree with JaredPar that writing your own recursive function that behaves similarly to fold, but allows you to stop the computation earlier is the best approach. The way I would write it is a bit more general (so that you can use the function for any situation where you need folding that can stop earlier):

// Generalized 'fold' function that allws you to stop the execution earlier
// The function 'f' has a type 'State -> 'T -> Option<'State>
// By returning 'None' we can stop the execution (and return the 
// current state), by returning Some(newState), we continue folding
let rec foldStop f state input = 
  match input with
  | x::xs -> 
      match f state x with
      | None -> state
      | Some(newState) -> foldStop f newState xs
  | [] -> state

// Example that stops folding after state is larger than 10
foldStop (fun st n -> if st > 10 then None else Some(st + n)) 0 [ 1 .. 10 ]

This is a very general function and you can use it for all similar scenarios. The nice thing about writing it is that you will never need to write similar explicit recursion again (because you can just use foldStop once you have it).

Note that you can use foldStop to implement fold by always wrapping the result of the accumulation function in 'Some' (so it is more general):

let fold f state input = 
  foldStop (fun st n -> Some(f st n)) state input


Let's imagine Python had two functions, ireduce (similar to reduce but it would yield intermediate values; it's called scanl in some languages) and ilast (get last item of an iterable):

from itertools import takewhile
from operator import add
xs = [1, 2, 3, 4, 5]
pair = ilast(enumerate(takewhile(lambda x: x < 8, ireduce(add, xs, 0))))
# (3, 6)

In Haskell:

last $ zip [0..] (takeWhile (< 8) (scanl (+) 0 xs))


I think that the 'most functional' way to do this is probably via lazy evaluation. If you're in a lazy language like Haskell, or in an eager language but using a lazy list data structure (like LazyList in the F# PowerPack), you can create e.g. a 'scan' of the running sums, and then leave it in the hands of the consumer of the list to decide how much she wants/needs to evaluate.

Or, you know, write a simple recursive function, like @JaredPar's answer. For some reason I often get a mental block on that, preventing me from noticing that "not everything has to be a fold, you can in fact write your own recursive functions" :)


Try the following

let sumUntil list stopAfter = 
    let rec inner list sum = 
        if sum >= stopAfter then sum
        else 
            match list with
            | [] -> sum
            | h::t-> inner t (sum + h)
    inner list 0    

F# interactive result

> sumUntil [1;2;3;4;5] 8;;
val it : int = 10


This is a function that implements that functional program:

>>> def limited_reduce(reducer, pred, lst):
...  i = 0
...  y = lst[0]
...  while pred(y) and i < len(lst):
...    i += 1
...    y = reducer(lst[i], y)
...  return (i, y)

or recursively:

>>> def limited_reduce(reducer, pred, lst):
...   def helper(i, accum, rest):
...     if not rest or not pred(accum): return (i, accum)
...     return helper(i+1, reducer(rest[0], accum), rest[1:])
...   return helper(0, lst[0], lst[1:])

There's probably a way to clean it up a bit, but you would use it like this:

>>>> limited_reduce(lambda x,y: x+y, lambda r: r < 6, [1,2,1,3,2])
(3, 7)


I think this does what you are after, using functions built-in to the F# Seq module:

let answer =
    [1; 2; 3; 4; 5]
    |> Seq.scan (fun (count,sum) x -> (count+1, sum + x) ) (0,0)
    |> Seq.find (fun (_,x) -> x > 8)

The "scan" function is similar to "fold", but returns a sequence containing intermediate (and final) states, rather than just the final state. In this case, the state is a tuple containing a count and sum of items thus far processed, starting with (0,0). This gets computed and fed, one at a time, into the "find" function, which returns the first element which matches the supplied condition (v>8), in this case (4,10).

The only issue you'd need to handle with the above is the case where the "find" condition is never satisfied, in which case a KeyNotFoundException is thrown. You could use "tryFind" which returns an option value. However, I can't see a graceful way to return the last element computed if no earlier state matches the condition, short of pre-computing the length of the sequence:

let xs = [1; 2; 3; 4; 5]
let len = Seq.length xs
let answer =
    xs
    |> Seq.scan (fun (count,acc) v -> (count+1, v + acc) ) (0,0)
    |> Seq.find (fun (count,v) -> v > 99 || count = len)


Another functional approch could be using a "continution"-based version of reduce/fold:

let rec foldC fn acc cont = function
    | []      -> acc
    | x :: xs -> fn x acc (fun acc -> foldC fn acc cont xs) 

Call with 'id' (fun x -> x) as 'initial continuation':

foldC (fun x sum c -> 
           if (sum + x) > 8 
           then sum 
           else c (sum + x))
      0
      (fun x -> x) 
      [1; 2; 3; 4; 5]

And you will get your '6'.

Note that this version of foldC is not tail recursive - or otherwise recommended - thought...


If you want to avoid carrying out unnecessary computations (which you still will with a map-reduce algorithm), write your own reduce and catch StopIteration:

from functools import reduce as _reduce

def stop_iter(rv=None):
    raise StopIteration(rv)

def reduce(*args):
    try: return _reduce(*args)
    except StopIteration as e: return e.args[0]

Then, write a step function that wraps the return value in a call to stop_iter when you reach a certain condition. Using your original lambda:

reduce(lambda a, b : stop_iter(a) if a + b > 8 else a + b, input)

Similar to Duncan's answer, but allows you to use lambdas (no manually raising exceptions).


The only way to get out of the builtin reduce part way through is to throw an exception. Fortunately it's not hard to get the desired result this way:

def interruptible_reduce(fn, *args):
    try:
        return reduce(fn, *args)
    except StopIteration, e:
        return e.args[0]

def reducefn(a, b):
    total = a[1] + b[1]
    if total > 8:
        raise StopIteration(a)
    return (a[0]+b[0], total)

input = [2, 1, 3, 4, 5]

>>> from itertools import imap
>>> interruptible_reduce(reducefn, imap(lambda x: (1,x), input))
(3, 6)


I know you're specifically interested in python, but I thought I would chime in with respect to how Clojure accomplishes this, since it solves the problem quite elegantly and directly.

Clojure has a reduced function that returns a version of whatever it is passed, such that this version will immediately terminate within a call to reduce. This makes it trivially simple to do something like this:

(reduce (fn [a v]
          (if (< a 100) 
            (+ a v)
            (reduced a)))
        (range 20))
;; => 105

This returns the first sum which is greater than or equal to a hundred, or the greatest sum reached if none exceeds. And it's worth noting that it does this without consuming/iterating through the entirety of the collection being reduced over, which could be very large or even infinite lazy sequence. Moreover, this has a definite advantage over applying some filter operation first, as you can have your termination condition dependent on the value being constructed, not just by individual values in the collection being reduced.

You mention this idea seems somehow "anit-functional". This might seem to be the case in python, where it's unclear how you would accomplish it without resorting to some messy outside state (or at best an alternate version of reduce). However, this works cleanly and functionally (even purely so) in Clojure because it's been baked into the language. The key is that reduce knows to look for reduced values, and objects can carry that information around with them (either as a wrapped value of as metadata; not sure which actually...).

It's certainly a handy feature I've been glad to have when I've needed it.


First, in F#. What's the first triangle number greater than 100?

> [1..1000] |> Seq.scan (+) 0 |> Seq.find (fun x -> x > 100);;
val it : int = 105

Note that Seq.scan is lazy, so triangle numbers beyond the solution are never calculated.

To find the ordinal of the solution, we exchange find for findIndex

> [1..1000] |> Seq.scan (+) 0 |> Seq.findIndex (fun x -> x > 100);;
val it : int = 14

In Python, the analogue of F#'s List.scan is itertools.accumulate, introduced Python 3.2 (2011).

>>> from itertools import accumulate
>>> next(x for x in accumulate(range(0,1000)) if x > 100)
105
>>> next(i for (i,x) in enumerate(accumulate(range(0,1000))) if x > 100)
14


Here is a slight variation of Stephen's code, using foldl instead of foldr (I hope) and not requiring a sequence:

#!/usr/bin/env python

import operator
import functools

def limited_reduce(op, it, start, pred):
    if not pred(start):
        return 0, start
    for i, x in enumerate(it):
        y = op(start, x)
        if pred(y):
            start = y
        else:
            break
    return i, start

print limited_reduce(operator.add, xrange(1, 6), 0,
                     functools.partial(operator.gt, 8))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜