Why do LINQ operations lose the static type of the collection?
Regardless of the collection type I use as input, LINQ always returns IEnumerable<MyType>
instead of List<MyType>
or HashSet<MyType>
.
From the MSDN tutorial:
int[] numbers = new int[7] { 0, 1, 2, 3, 4, 5, 6 };
// numQuery is an IEnumerable<int> <====== Why IEnumerable<int> instead of int[]?
var numQuery =
from num in numbers
where (num % 2) == 0
select num;
I wonder what's the rationale behind the decision to not preserve the collection type (like the element type), not the suggested work-around.
I know that toArray
, toDictionary
or toList
exist, that's not the question.
Edit: How does the implementation in C# differ from Scala where it works without overriding the return types everywher开发者_JAVA技巧e?
Linq standard query operators are defined on IEnumerable<T>
, not on any other specific collection type. It is applicable to most collections because they all implement IEnumerable<T>
.
The output of these standard query operators is a new IEnumerable<T>
- if you want to support all collections available, you would need N implementations of those standard query operators, not just one.
Also many aspects of Linq, like lazy evaluation would not work well if forced to work an i.e. a List<T>
as opposed to an IEnumerable<T>
.
Short answer: C#'s type system is too primitive to support preservation of collection type in higher order functions (without code duplication, that is).
Long answer: Refer to my post here. (Actually it's about Java, but applies to C# as well.)
In the specific case of LINQ-to-objects, the LINQ-provider is a set of extension method defined in the System.Linq.Enumerable
class. To be as general purpose as possible, they are defined to extend any type which implements the IEnumerable<T>
interface. Therefore, when these methods are executed, they aren't aware of which concrete type the collection is; it only knows that it is IEnumerable<T>
. Because of this, the operations you perform on a collection cannot be guaranteed to produce a collection of the same type. So instead, it produces the next best thing - a IEnumerable<T>
object.
This was a very specific design choice. LINQ is intended to work at a very high level of abstraction. It exists so that you don't have to care about underlying sources of data, but can instead just say what you want and not care about such details.
LINQ takes an IEnumerable<> and returns one. Very natural. The reason IEnumerable is what LINQ takes- pretty much all collections implement it.
Many System.Linq.Enumerable methods are lazily evaluated. This allows you to create a query without executing it.
//a big array - have to start somewhere
int[] source = Enumerable.Range(0, 1000000).ToArray();
var query1 = source.Where(i => i % 2 == 1); //500000 elements
var query2 = query1.Select(i => i.ToString()); //500000 elements
var query3 = query2.Where(i => i.StartsWith("2"); //50000? elements
var query4 = query3.Take(5); //5 elements
string[] result = query4.ToArray();
This code calls ToString about 15 times. It allocates a single string[5] array.
If Where and Select returned arrays, this code would call ToString 500000 times and have to allocate two massive arrays. Thank goodness that is not the case.
精彩评论