开发者

How to merge sorted sequences?

Here’s a problem I’ve really been struggling with. I need to merge two sorted sequences into a single sorted sequence. Ideally, the algorithm should be lazy-evaluated, and not require caching more than one item from each sequence. This is not a terribly difficult problem to solve, and I’ve been able to engineer a number of solutions in F#. Unfortunately, every solution I’ve come up with has one of several problems.

  1. Recursive calls to subsequence generators using yield!. This produces elegant looking solutions, but the creation of a subsequence for every item is a performance killer.

  2. Really arcane and unmaintainable code with deeply-stacked match switches, multiple nearly identical blocks of code, etc.

  3. Code which forces F# into a purely procedural mode (lots of mutable values, etc.).

And all of the online examples I've been able to find founder on the same shoals.

Am I开发者_JS百科 missing something obvious: like it's either really simple or else obviously impossible? Does anyone know of a really elegant solution that is also efficient and mostly functional? (It doesn’t have to be purely functional.) If not, I may end up caching subsequences and using lists or arrays.


Ideally, the algorithm should be lazy-evaluate... the creation of a subsequence for every item is a performance killer

Lazy means slow but here is a solution using lazy lists:

let (++) = LazyList.consDelayed

let rec merge xs ys () =
  match xs, ys with
  | Cons(x, xs'), Cons(y, _) when x<y -> x ++ merge xs' ys
  | Cons(x, _), Cons(y, ys') -> y ++ merge xs ys'
  | Nil, xs | xs, Nil -> xs

I think by "lazy evaluated" you mean you want the merged result to be generated on demand so you can also use:

let rec merge xs ys = seq {
  match xs, ys with
  | x::xs, y::_ when x<y ->
      yield x
      yield! merge xs ys
  | x::_, y::ys ->
      yield y
      yield! merge xs ys
  | [], xs | xs, [] -> yield! xs
}

As you say, this is very inefficient. However, a seq-based solution doesn't have to be slow. Here, Seq.unfold is your friend and can make this over 4× faster by my measurements:

let merge xs ys =
  let rec gen = function
    | x::xs, (y::_ as ys) when x<y -> Some(x, (xs, ys))
    | xs, y::ys -> Some(y, (xs, ys))
    | [], x::xs | x::xs, [] -> Some(x, ([], xs))
    | [], [] | [], [] -> None
  Seq.unfold gen (xs, ys)


Use the LazyList type in the PowerPack. I think I maybe even have this exact code lying around, let me look...

EDIT

not exactly it, but close: http://cs.hubfs.net/forums/thread/8136.aspx


Sequences don't really pattern match well.

Fortunately one of the advantages of F# is being able to drop down to imperative code when you need to, and I think it still idiomatic to use mutable state internally so long as the function is still pure to clients consuming the function. I think this style is really common in the F# source code wherever sequences are involved.

Its not pretty, but this works:

open System.Collections.Generic
let merge (a : #seq<'a>) (b : #seq<'a>) =
    seq {
        use a = a.GetEnumerator()
        use b = b.GetEnumerator()

        let aNext = ref <| a.MoveNext()
        let bNext = ref <| b.MoveNext()

        let inc (enumerator : IEnumerator<'a>) flag =       // '
            let temp = enumerator.Current
            flag := enumerator.MoveNext()
            temp
        let incA() = inc a aNext
        let incB() = inc b bNext

        while !aNext || !bNext do
            match !aNext, !bNext with
            | true, true ->
                if a.Current > b.Current then yield incB()
                elif a.Current < b.Current then yield incA()
                else yield incA(); yield incB()
            | true, false -> yield incA()
            | false, true -> yield incB()
            | false, false -> ()
    }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜