开发者

Rationale of C# iterators design (comparing to C++)

I found similar topic:

Iterators in C++ (stl) vs Java开发者_如何学编程, is there a conceptual difference?

Which basically deals with Java iterator (which is similar to C#) being unable to going backward.

So here I would like to focus on limits -- in C++ iterator does not know its limit, you have by yourself compare the given iterator with the limit. In C# iterator knows more -- you can tell without comparing to any external reference, if the iterator is valid or not.

I prefer C++ way, because once having iterator you can set any iterator as a limit. In other words if you would like to get only few elements instead of entire collection, you don't have to alter the iterator (in C++). For me it is more "pure" (clear).

But of course MS knew this and C++ while designing C#. So what are the advantages of C# way? Which approach is more powerful (which leads to more elegant functions based on iterators). What do I miss?

If you have thoughts on C# vs. C++ iterators design other than their limits (boundaries), please also answer.

Note: (just in case) please, keep the discussion strictly technical. No C++/C# flamewar.

EDITS

As Tzaman put it "There's no benefit to having the limit be expressed separately, because there's no way to get there except walking one element at a time." However it is not that difficult to built a C# iterator which does several steps a time, thus the question -- is there a benefit of having explicit limit iterator (as in C++)? If yes -- what?

@Jon,

Edit1: let's say you have a function foo which does something on iterators (the example is very naive!)

void foo(iter_from,iter_end) // C++ style
void foo(iter) // C# style 

And now you would like to call function bar on all elements except last 10.

bar(iter_from,iter_end-10); // C++ style

In C# (if I am not mistaken) you would have to provide extra method for this iterator to change its limit, something like this:

bar(iter.ChangeTheLimit(-10));

Edit2: After rereading of your post, I sense crucial difference. In C++ you work on iterators of collection, in C# you work on collections (iterator is used "internally"). If yes, I still feel a bit uncomfortable with C# -- you iterate the collection and when you find interesting element you would like to pass all elements from "here" to end. In C++ it is very easy and there is no overhead. In C# you either pass an iterator or a collection (if the latter there will be extra computing). I'll wait for your comment :-)

@Hans,

I am not comparing apples and oranges. The comp. theory is common ground here, so you have sort algorithm, partition, etc. You have the notion of collection (or sequence, as Jon likes). Now -- the matter is how you design the access to the elements to have elegant algorithm written in C# or C++ (or any other language). I would like to understand the reasoning "we did this, because...".

I know .NET iterators and collections are separate classes. I know the difference between access to an element and entire collection. However in C++ the most general way to work on a collection is working with iterators -- this way you can work with list and vector despite those collections are completely different. In C# on the other hand you rather write

sort(IEnumerable<T> coll) 

function instead

sort(IEnumerator<T> iter)

correct? So in this sense I guess you cannot take C# iterators as C++ iterators, because C# does not express the same algorithms the same way as C++ does. Or as Jon pointed out -- in C# you rather transform collection (Skip, Take) instead of changing iterators.


It feels to me like the C# design is more encapsulated: a sequence runs out or does, independent of anything else. Where does it make sense to compare one limit with another?

If you only want to take a few elements, then LINQ provides any number of ways of building one sequence from another, e.g.

foo.Take(10)
foo.Skip(10)
foo.Where(x => x.StartsWith("y"))

etc

I think it's clearer - and more composable - to transform one sequence into another than to specify this with limits. If you want to pass an iterator to another function, but you want to restrict it to the first few elements, why should you have to also pass a limit? Why not just pass the transformed sequence which self-limits?

EDIT: To address your question edit: in C# (at least with LINQ) you wouldn't modify the existing collection. You would create a new sequence from the old one. This is done lazily; it doesn't create a new copy or anything like that. For LINQ to Objects, this is performed using extension methods on IEnumerable<T>, so any sequence gains the same abilities.

Note that this isn't restricted to traditional collections - it could be a sequence of lines read from a log file (again, lazily). You don't need any knowledge of the collection itself, just that it's a sequence from which one can draw items. There's also the difference between an IEnumerable<T> and an IEnumerator<T> where the first represents the sequence and the second represents an iterator over a sequence; IEnumerator<T> is rarely used explicitly in C#, or passed around.

Now, your example of "all elements except the last 10" is a tricky one, because for a general sequence you can't tell that you're 10 elements from the end until you've reached the end. There's nothing in LINQ to Objects to do this explicitly. For anything implementing ICollection or ICollection<T> you could use

Bar(list.Take(list.Count - 10))

but that's not very general. A more general solution would need to maintain a circular buffer of 10 elements, effectively reading 10 ahead of where it's yielding. I've rarely found this to be a requirement, to be honest.


I think as long as you're talking only about forward iterators, I like the C# way - the sequence knows its own limit, you can transform it easily via LINQ, and so on. There's no benefit to having the limit be expressed separately, because there's no way to get there except walking one element at a time.

However, C++ iterators are a lot more powerful - you can have bidirectional iterators, or random-access iterators, that let you do things that aren't possible with a one-way sequential iterator. A sequential iterator is in a sense a generalization of a linked list, while a random access iterator is a generalization of an array. That's why you can efficiently express algorithms like quicksort in terms of C++ iterators, while that would be impossible with an IEnumerable.


You are comparing apples and oranges. The STL collection classes had completely different design goals. They are a one-size-fits-all design to collections, you can only ever get the classes that are provided. There's no way to customize a collection, the STL classes were not designed to be inherited from.

Very different from the .NET ones, their core is the ICollection, IDictionary and IList interfaces. The .NET framework contains hundreds of collection classes, customized to get their specific job done.

That affected the designs of their iterators as well. STL iterators must provide a lot of flexibility since the collections they iterate cannot be customized. There is an inherent cost associated with that. There are 5 different STL iterator types, not all collection classes support all 5 different types. Loss of generality is never a good thing in a library design.

While an argument can be made that C++ iterators are more flexible, I'll make the opposite assertion. The simplicity of IEnumerator was an enabler for very valuable .NET features. The C# yield keyword for example, it completely disassociates a collection with the behavior of an iterator. Linq would not have happened if it wasn't for a simple iterator type. Covariance support would not have been possible.

Regarding your Edit2, no, .NET iterators are separate classes, just like the C++ ones. Make sure to understand the difference between IEnumerable (implemented by the collection) and IEnumerator (implemented by the iterator).


So what are the advantages of C# way?

Encapsulation for one. It makes it more difficult to mess up your iteration limits if you do not set the iteration sequence manually.

C++ example:

std::vector<int> a;
std::vector<int> b;
std::vector<int>::iterator ba = a.begin();
std::vector<int>::iterator ea = a.end();
std::vector<int>::iterator bb = b.begin();
std::vector<int>::iterator eb = b.end();
// lots of code here
for(auto itr = ba; itr != eb; itr++) // iterator tries to "cross" vectors
                                     // you get undefined behavior, depending 
                                     // on what you do with itr

Which approach is more powerful (which leads to more elegant functions based on iterators).

That depends on what you mean by most powerful.

The C++ approach is more flexible.

The C# one is more difficult to misuse.


I found this topic because of the following problem. There is a sequence that needs to be forward iterated one element at a time. However, at certain points a large number of elements needs to be skipped. When such skips occur, it is inefficient to do anything but jump and restart at the new index (i.e. it is not OK to run the loop for the skipped indices even without any loop body execution.)

Here's a C# snippet to give you the idea, except that it is using an int index rather than an enumerator.

for(int i = 0; i< 1000; ++i)
{
    ... // do something

    if (i == 100)
       i = 200; // skip from 100 to 200
}

The point is that going to the next element can be optimized for this particular sequence, but skipping elements (and all-out random access) is more costly. @Jon, using IList is therefore inefficient for this purpose. Yield doesn't directly allow this style either.

I tried various ways to do this in C# including using a lambda ForEach() style for the inner block, which returns the next index. However, this doesn't work very well.

The approach that will likely result in the least overhead is a simple iterator type - not based on IEnumerable or yield - that allows me to mimic the above code.

Thoughts?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜