The complexity of Concat()
In Haskell, when using :
repeatedly to create a sequence, iterating through the result takes O(N) time.
1:2:3:4:[]
If I try to do something similar in LINQ, I can use Concat()
, but iterating through the result takes O(N²) time, because when I call MoveNext()
on the resulting enumerator for n-th time, it has to go through n 开发者_Python百科layers of Concat()
s.
new[] { 1 }.Concat(new[] { 2 }).Concat(new[] { 3 }).Concat(new[] { 4 })
or
new[] { 1 }.Concat(new[] { 2 }.Concat(new[] { 3 }.Concat(new[] { 4 })))
How can I do this, so that it's fast (i.e. linear) and functionally “clean” (i.e. without using List<T>
)? Maybe by writing IConcatenable<T>
with its own overload of Concat()
? Or am I wrong that using Concat()
this way is quadratic?
The LINQ Concat()
operation is in no way equivalent to the functional cons
operation. They are very different. For every cons operation, a "new" list is created. This is actually fast since the data structures in play are designed specifically for this use. For every Concat operation, a new iterator is created and isn't really meant for this kind of use. To illustrate what is done each time, consider the following shorter example:
1:2:3:[]
The functional operation consists of a few steps to evaluate. 3:[]
results in the temporary list [3]
, then 2:[3]
results in the list [2,3]
and 1:[2,3]
. That's 3 simple steps just to create the list. To "iterate" through it, will require some recursion and matching (or whatever the Haskell equivalent is). That will take somewhere in the vicinity of 3 or 4 more complex steps (one for each list segment).
new[] { 1 }.Concat(new[] { 2 }).Concat(new[] { 3 });
The LINQ operation consists of a few steps too to evaluate. new[] { 1 }.Concat(new[] { 2 })
yields a new iterator iterating through the sequence [[1]+[2]]
, and the last yields a new iterator iterating through the sequence [[[1]+[2]]+[3]]
. That took two simple steps just to create the iterator, but you should notice how complex the sequence actually is represented. 5 iterators are used here, one for each array (3) and one for each concatenated pair (2). I won't go through all the steps of the iteration but this does many more function calls and property accesses (as required by each iterator).
new[] { 1 }.Concat(new[] { 2 }.Concat(new[] { 3 }))
The LINQ operation that looks more structurally equivalent again takes the same amount of steps to create, yielding [[1]+[[2]+[3]]]
with the same amount of iterators and just as complex iterating though it.
You might be thinking the functional version should be more complex because we need some recursive functions. Well it's not because that's the way things are done in functional languages and they are optimized for this use. They have the advantage of using immutable sequences that could be reused in other composed lists. Generating complex sequences using LINQ in this way isn't really what it was designed to do. There are no optimizations, by the language (some by the JIT but that's it) and it's clearly not how you'd want to iterate through a sequence. You hit it right on the head IMHO on why it is complex.
I think the better approach in an attempt at better performance is to create linked lists to represent the concatenated sequence. You could either use the LinkedList<T>
class, create your own data structures reminiscent of a functional list's or even better, use F#'s FSharpList<T>
. Then extend with extension methods and other supporting classes.
You could try .AsParallel but i doubt the parallelization is worth it unless your sequences are very large, also only works on indexable data
精彩评论