Stopping a Reduce() operation mid way. Functional way of doing partial running sum

2023-01-05 02:03 问答作者：

I have been doing some functional programming and had a question. Perhaps I might be missing something but is there any way to stop a "reduce()" function midway? Lets say when I reach a certain condition? The idea somehow seems anti functional. I haven't seen any such option in python or F#,

As an example, lets say I have a list such as [1,2,3,4,5]. I want to sum the elements in this list until the sum is not greater than some number (lets say 8), and return/mark/store/identify somehow, the numbe开发者_如何学编程r of elements I have actually added.

If we looked at python for example for I might try something like

reduce(lambda a,b : a if a + b > 8 else a + b, input)

This gives me the right answer 6, but how do I find that I had added 3 elements to get here. There is no counter as such. I can't do assignments inside lambdas. I think F# has the same situation.

I know I can use a for loop or use a function that can store state etc. But what would be the functional way of doing/thinking about this. Reduce() wants to run until the end, but somewhere along this line of processing, we either want to stop it (because we don't care about processing the rest of the elements) or at least make a note of the place where we stopped caring.

Reduce is often used in combination with map. Google for example has developed a map-reduce framework for querying their databases and this map-reduce pattern is now used in several other projects (e.g. CouchDB, Hadoop, etc).

First, you need to map the input variables [2, 1, 3, 4, 5] to something like:

[(1, 2), (1, 1), (1, 3), (1, 4), (1, 5)]

In that case, x[0] will represent the number of the elements to get the sum x[1]. Of course, the number of elements is 1 at the beginning for each single element.

The next thing then, is to operate on those tuples:

reduce(
    lambda a, b: a if a[1] + b[1] > 8 else (a[0] + b[0], a[1] + b[1]),
    map(lambda x: (1, x), input))

This will return (3, 6), meaning the partial sum is 6 using 3 elements.

I hope you got the idea behind map-reduce-algorithms.

Regards,
Christoph

I agree with JaredPar that writing your own recursive function that behaves similarly to fold, but allows you to stop the computation earlier is the best approach. The way I would write it is a bit more general (so that you can use the function for any situation where you need folding that can stop earlier):

// Generalized 'fold' function that allws you to stop the execution earlier
// The function 'f' has a type 'State -> 'T -> Option<'State>
// By returning 'None' we can stop the execution (and return the 
// current state), by returning Some(newState), we continue folding
let rec foldStop f state input = 
  match input with
  | x::xs -> 
      match f state x with
      | None -> state
      | Some(newState) -> foldStop f newState xs
  | [] -> state

// Example that stops folding after state is larger than 10
foldStop (fun st n -> if st > 10 then None else Some(st + n)) 0 [ 1 .. 10 ]

This is a very general function and you can use it for all similar scenarios. The nice thing about writing it is that you will never need to write similar explicit recursion again (because you can just use foldStop once you have it).

Note that you can use foldStop to implement fold by always wrapping the result of the accumulation function in 'Some' (so it is more general):

let fold f state input = 
  foldStop (fun st n -> Some(f st n)) state input

Let's imagine Python had two functions, ireduce (similar to reduce but it would yield intermediate values; it's called scanl in some languages) and ilast (get last item of an iterable):

from itertools import takewhile
from operator import add
xs = [1, 2, 3, 4, 5]
pair = ilast(enumerate(takewhile(lambda x: x < 8, ireduce(add, xs, 0))))
# (3, 6)

In Haskell:

last $ zip [0..] (takeWhile (< 8) (scanl (+) 0 xs))

I think that the 'most functional' way to do this is probably via lazy evaluation. If you're in a lazy language like Haskell, or in an eager language but using a lazy list data structure (like LazyList in the F# PowerPack), you can create e.g. a 'scan' of the running sums, and then leave it in the hands of the consumer of the list to decide how much she wants/needs to evaluate.

Or, you know, write a simple recursive function, like @JaredPar's answer. For some reason I often get a mental block on that, preventing me from noticing that "not everything has to be a fold, you can in fact write your own recursive functions" :)

Try the following

let sumUntil list stopAfter = 
    let rec inner list sum = 
        if sum >= stopAfter then sum
        else 
            match list with
            | [] -> sum
            | h::t-> inner t (sum + h)
    inner list 0

F# interactive result

> sumUntil [1;2;3;4;5] 8;;
val it : int = 10

This is a function that implements that functional program:

>>> def limited_reduce(reducer, pred, lst):
...  i = 0
...  y = lst[0]
...  while pred(y) and i < len(lst):
...    i += 1
...    y = reducer(lst[i], y)
...  return (i, y)

or recursively:

>>> def limited_reduce(reducer, pred, lst):
...   def helper(i, accum, rest):
...     if not rest or not pred(accum): return (i, accum)
...     return helper(i+1, reducer(rest[0], accum), rest[1:])
...   return helper(0, lst[0], lst[1:])

There's probably a way to clean it up a bit, but you would use it like this:

>>>> limited_reduce(lambda x,y: x+y, lambda r: r < 6, [1,2,1,3,2])
(3, 7)

I think this does what you are after, using functions built-in to the F# Seq module:

let answer =
    [1; 2; 3; 4; 5]
    |> Seq.scan (fun (count,sum) x -> (count+1, sum + x) ) (0,0)
    |> Seq.find (fun (_,x) -> x > 8)

The "scan" function is similar to "fold", but returns a sequence containing intermediate (and final) states, rather than just the final state. In this case, the state is a tuple containing a count and sum of items thus far processed, starting with (0,0). This gets computed and fed, one at a time, into the "find" function, which returns the first element which matches the supplied condition (v>8), in this case (4,10).

The only issue you'd need to handle with the above is the case where the "find" condition is never satisfied, in which case a KeyNotFoundException is thrown. You could use "tryFind" which returns an option value. However, I can't see a graceful way to return the last element computed if no earlier state matches the condition, short of pre-computing the length of the sequence:

let xs = [1; 2; 3; 4; 5]
let len = Seq.length xs
let answer =
    xs
    |> Seq.scan (fun (count,acc) v -> (count+1, v + acc) ) (0,0)
    |> Seq.find (fun (count,v) -> v > 99 || count = len)

Another functional approch could be using a "continution"-based version of reduce/fold:

let rec foldC fn acc cont = function
    | []      -> acc
    | x :: xs -> fn x acc (fun acc -> foldC fn acc cont xs)

Call with 'id' (fun x -> x) as 'initial continuation':

foldC (fun x sum c -> 
           if (sum + x) > 8 
           then sum 
           else c (sum + x))
      0
      (fun x -> x) 
      [1; 2; 3; 4; 5]

And you will get your '6'.

Note that this version of foldC is not tail recursive - or otherwise recommended - thought...

If you want to avoid carrying out unnecessary computations (which you still will with a map-reduce algorithm), write your own reduce and catch StopIteration:

from functools import reduce as _reduce

def stop_iter(rv=None):
    raise StopIteration(rv)

def reduce(*args):
    try: return _reduce(*args)
    except StopIteration as e: return e.args[0]

Then, write a step function that wraps the return value in a call to stop_iter when you reach a certain condition. Using your original lambda:

reduce(lambda a, b : stop_iter(a) if a + b > 8 else a + b, input)

Similar to Duncan's answer, but allows you to use lambdas (no manually raising exceptions).

The only way to get out of the builtin reduce part way through is to throw an exception. Fortunately it's not hard to get the desired result this way:

def interruptible_reduce(fn, *args):
    try:
        return reduce(fn, *args)
    except StopIteration, e:
        return e.args[0]

def reducefn(a, b):
    total = a[1] + b[1]
    if total > 8:
        raise StopIteration(a)
    return (a[0]+b[0], total)

input = [2, 1, 3, 4, 5]

>>> from itertools import imap
>>> interruptible_reduce(reducefn, imap(lambda x: (1,x), input))
(3, 6)

I know you're specifically interested in python, but I thought I would chime in with respect to how Clojure accomplishes this, since it solves the problem quite elegantly and directly.

Clojure has a reduced function that returns a version of whatever it is passed, such that this version will immediately terminate within a call to reduce. This makes it trivially simple to do something like this:

(reduce (fn [a v]
          (if (< a 100) 
            (+ a v)
            (reduced a)))
        (range 20))
;; => 105

This returns the first sum which is greater than or equal to a hundred, or the greatest sum reached if none exceeds. And it's worth noting that it does this without consuming/iterating through the entirety of the collection being reduced over, which could be very large or even infinite lazy sequence. Moreover, this has a definite advantage over applying some filter operation first, as you can have your termination condition dependent on the value being constructed, not just by individual values in the collection being reduced.

You mention this idea seems somehow "anit-functional". This might seem to be the case in python, where it's unclear how you would accomplish it without resorting to some messy outside state (or at best an alternate version of reduce). However, this works cleanly and functionally (even purely so) in Clojure because it's been baked into the language. The key is that reduce knows to look for reduced values, and objects can carry that information around with them (either as a wrapped value of as metadata; not sure which actually...).

It's certainly a handy feature I've been glad to have when I've needed it.

First, in F#. What's the first triangle number greater than 100?

> [1..1000] |> Seq.scan (+) 0 |> Seq.find (fun x -> x > 100);;
val it : int = 105

Note that Seq.scan is lazy, so triangle numbers beyond the solution are never calculated.

To find the ordinal of the solution, we exchange find for findIndex

> [1..1000] |> Seq.scan (+) 0 |> Seq.findIndex (fun x -> x > 100);;
val it : int = 14

In Python, the analogue of F#'s List.scan is itertools.accumulate, introduced Python 3.2 (2011).

>>> from itertools import accumulate
>>> next(x for x in accumulate(range(0,1000)) if x > 100)
105
>>> next(i for (i,x) in enumerate(accumulate(range(0,1000))) if x > 100)
14

Here is a slight variation of Stephen's code, using foldl instead of foldr (I hope) and not requiring a sequence:

#!/usr/bin/env python

import operator
import functools

def limited_reduce(op, it, start, pred):
    if not pred(start):
        return 0, start
    for i, x in enumerate(it):
        y = op(start, x)
        if pred(y):
            start = y
        else:
            break
    return i, start

print limited_reduce(operator.add, xrange(1, 6), 0,
                     functools.partial(operator.gt, 8))

继续阅读：f#functional-programming python

Stopping a Reduce() operation mid way. Functional way of doing partial running sum

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？