What's the role of IEnumerable<T> and why should I use it?
Why should I use IEnume开发者_开发知识库rable<T>
when I can make do with...say List<T>
? What's the advantage of the former over the latter?
IEnumerable<T>
is an interface that tells us that we can enumerate over a sequence of T
instances. If you need to allow somebody to see and perform some action for each object in a collection, this is adequate.
List<T>
, on the other hand, is a specific implementation of IEnumerable<T>
that stores the objects in a specific, known manner. Internally, this may be a very good way to store your values that you expose via IEnumerable<T>
, but a List<T>
is not always appropriate. For example, if you do not need to access items by index, but constantly insert items at the beginning of your collection and then remove items from the end, a Queue<T>
would be far more appropriate to use.
By using IEnumerable<T>
in your API, you provide yourself the flexibility to change the internal implementation at any time without changing any other code. This has huge benefits in terms of allowing your code to be flexible and maintainable.
On this point Jeffrey-Richter writes:
When declaring a method’s parameter types, you should specify the weakest type possible,
preferring interfaces over base classes. For example, if you are writing a method that
manipulates a collection of items, it would be best to declare the method’s parameter by
using an interface such as IEnumerable<T>
rather than using a strong data type such as List<T>
or even a stronger interface type such as ICollection<T>
or IList<T>
:
// Desired: This method uses a weak parameter type
public void ManipulateItems<T>(IEnumerable<T> collection) { ... }
// Undesired: This method uses a strong parameter type
public void ManipulateItems<T>(List<T> collection) { ... }
The reason, of course, is that someone can call the first method passing in an array object, a List<T>
object, a String
object, and so on — any object whose type implements IEnumerable<T>
. The second method allows only List<T>
objects to be passed in; it will not accept an array or a String
object. Obviously, the first method is better because it is much more flexible and can be used in a much wider range of scenarios.
Naturally, if you are writing a method that requires a list (not just any enumerable object), then you should declare the parameter type as an IList<T>
. You should still avoid declaring the parameter type as List<T>
. Using IList<T>
allows the caller to pass arrays and any other objects whose type implements IList<T>
.
On the flip side, it is usually best to declare a method’s return type by using the strongest type possible (trying not to commit yourself to a specific type).
If you plan to build a public API, it's better to use IEnumerable
than List<T>
, because you better use the most minimalistic interface/class. List<T>
lets you access objects by index if that's required.
Here is a pretty good guideline when to use IEnumerable
, ICollection
, List<T>
and so on.
Different implementations of collections can be enumerable; using IEnumerable makes it clear that what you're interested in is the enumerability, and not the structure of the underlying implementation of the collection.
As mentioned by Mr. Copsey, this has the benefit of providing decoupling from the implementation, but I'm of the contention that a clear definition of the smallest subset of interface functionality as possible (i.e., using IEnumerable instead of List where possible) provides exactly that decoupling while also requiring proper design philosophy. That is to say, you can achieve decoupling but not achieve minimal dependency, but you cannot achieve minimal dependency without achieving maximal decoupling.
Using the concept of iterators you can achieve major improvement in algorithm quality, both in terms of speed and memory usage.
Let's consider the following two code examples. Both parse the file, one stores lines in collection, the other uses enumerable.
First example is O(N) time, and O(N) memory:
IEnumerable<string> lines = SelectLines();
List<Item> items = lines.Select(l=>ParseToItem(l)).ToList();
var itemOfIterest = items.FirstOrDefault(IsItemOfIterest);
Second example is O(N) time, O(1) memory. Additionally, even if the asymptotic time complexity is still O(N), it would load twice as fewer items as in the first example on average:
var itemOfIterest = lines.FirstOrDefault(l=>IsItemOfIterest(ParseToItem(l));
Here is the code of SelectLines()
IEnumerable<string> SelectLines()
{
...
using(var reader = ...)
while((line=reader.ReadLine())!=null)
yield return line;
}
Here is why it loads twice as fewer items as the first example on average. Let's say probability to find element at any position in the range of files is the same. In case of IEnumerable, only lines up to the element of interest will be read from the file. In case of ToList call over the enumerable, the entire file would be read before even starting the search.
Of course, the List in the first example would hold all the items in memory, that is why O(N) memory usage.
Usually you don't use IEunumerable
directly. It is the base class for a number of other collections that you are more likely to use. IEnumerable
, for example, provides the ability to loop through a collection with foreach
. That is used by many inheriting classes such as List<T>
. But IEnumerable
doesn't offer a sort method (though you can use Linq for that) while some other generic collections like List<T>
do have that method.
Oh sure, you can use it to create your own custom collection types. But for everyday stuff, it probably isn't as useful as the collections derived from it.
IEnumerable gives you the way to implement your own logic of storing and iterating over object collection
Why implement IEnumerable in your class?
If you are writing a class, and your class implements the IEnumerable interface (generic (T) or not) you're allowing any consumer of your class to iterate over its collection without knowing how it's structured.
A LinkedList is implemented differently than a Queue, a Stack, a BinaryTree, a HashTable, Graph, etc. The collection represented by your class might be structured in different ways.
As a "consumer" (if you are writing a class, and your class consumes/makes-use-of an class-object that implements IEnumerable) you can use it without concerning yourself how it's implemented. Sometimes the consumer class doesn't care about the implementation - it just wants to go over all the items (printing them? changing them? comparing them? etc.)
(So as a consumer, if your task is to iterate over all items in a BinaryTree class, and you skipped that lesson in Data-Structures-101 - if the BinaryTree coder implemented the IEnumerable - you're in luck! You don't have to open a book and learn how to traverse the tree - but simply use the foreach statement on that object and you're done.)
As a "producer" (writing a class that holds the data-structure/collection) you might not want consumers of your class to handle how it's structured (afraid they might break it, maybe). So you can make the collection private, and only expose a public IEnumerator.
It can also allow for some uniformity - a collection might have a few ways to iterate over its items (PreOrder, InOrder, PostOrder, Breadth First, Depth First etc.) - but there's only 1 implementation for IEnumerable. You can use this to set the "default" way to iterate over the collection.
Why use IEnumerable in your method?
If I write a method that takes a collection, iterates over it, and takes action on the items (aggregate them? compares them? etc.) why should I restrict myself to 1 type of collections?
Writing this method public void Sum(List<int> list) {...}
to sum all items in the collection means I can only receive a list and sum over it. Writing this public void Sum(IEnumerable<int> collection) {...}
means I can take any object that implements IEnumerable (Lists, Queues, Stacks, etc.) and sum all of their items as well.
Other Considerations
There's also the issues of deferred execution and unmanaged resources. IEnumerable takes use of the yield syntax, which mean you go over each item by-itself and can perform all kinds of computations before and after. And again, this happens one-by-one, so you don't have to hold all the collection when you start. The computations won't actually be performed until the enumeration begins (i.e. until you run the foreach loop). And that might be useful and more efficient in certain cases. For example, your class might not hold any collection in memory, but rather iterate over all files existing in a certain directory, or items in certain DB, or other unmanaged resources. IEnumerable can step in and do it for you (you could also do it without IEnumerable, but IEnumerable "fits" conceptually, plus it gives you the benefit of being able to use the object produced in a foreach loop).
Implementation of IEnumerable<T> is generally the preferred way for a class to indicate that it should be usable with a "foreach" loop, and that multiple "foreach" loops on the same object should operate independently. Although there are uses of IEnumerable<T> other than "foreach", the normal indication that one should implement IEnumerable is that the class is one where it would make sense to say "foreach foo in classItem {foo.do_something();}.
IEnumerable is taking advantage of deferred execution as explained here: IEnumerable vs List - What to Use? How do they work?
IEnumerable enable implicit reference conversion for array types which known as Covariance. Consider the following example:
public abstract class Vehicle { }
public class Car :Vehicle
{
}
private void doSomething1(IEnumerable<Vehicle> vehicles)
{
}
private void doSomething2(List<Vehicle> vehicles)
{
}
var vec = new List<Car>();
doSomething1(vec); // this is ok
doSomething2(vec); // this will give a compilation error
精彩评论