Higher order iterators in Ruby?
I was reading a Ruby q开发者_如何学Cuestion about the .each
iterator, and someone stated that using .each
can be a code smell if higher order iterators are better suited for the task. What are higher order iterators in Ruby?
edit: Jörg W Mittag, the author of the StackOverflow answer that I was referring to mentioned that he meant to write higher level iterators, but he also explained what they are very well below.
Oops. I meant higher-level iterators, not higher-order. Every iterator is of course by definition higher-order.
Basically, iteration is a very low-level concept. The purpose of programming is to communicate intent to the other stakeholders on the team. "Initializing an empty array, then iterating over another array and adding the current element of this array to the first array if it is divisible by two without a remainder" is not communicating intent. "Selecting all even numbers" is.
In general, you almost never iterate over a collection just for iteration's sake. You either want to
- transform each element in some way (that's usually called
map
, in Ruby and Smalltalk it'scollect
and in .NET and SQL it'sSelect
), - reduce the whole collection down to some single value, e.g. computing the sum or the average or the standard deviation of a list of football scores (in category theory, that's called a catamorphism, in functional programming it is
fold
orreduce
, in Smalltalk it'sinject:into:
, in Ruby it'sinject
and in .NETAggregate
), - filter out all elements that satisfy a certain condition (
filter
in most functional languages,select
in Smalltalk and Ruby, alsofind_all
in Ruby,Where
in .NET and SQL), - filter out all elements that do not satisfy a condition (
reject
in Smalltalk and Ruby) - find the first element that satisfies a condition (
find
in Ruby) - count the elements thats satisfy a condition (
count
in Ruby) - check if all elements (
all?
), at least one element (any?
) or no elements (none?
) satisfy a condition - group the elements into buckets based on some discriminator (
group_by
in Ruby, .NET and SQL) - partition the collection into two collections based on some predicate (
partition
) - sort the collection (
sort
,sort_by
) - combine multiple collections into one (
zip
) - and so on and so forth …
Almost never is your goal to just iterate over a collection.
In particular, reduce
aka. inject
aka. fold
aka. inject:into:
aka. Aggregate
aka. catamorphism is your friend. There's a reason why it has such a fancy-sounding mathematical name: it is extremely powerful. In fact, most of what I mentioned above, can be implemented in terms of reduce
.
Basically, what reduce
does, is it "reduces" the entire collection down to a single value, using some function. You have some sort of accumulator value, and then you take the accumulator value and the first element and feed it into the function. The result of that function then becomes the new accumulator, which you pair up with the second element and feed to the function and so on.
The most obvious example of this is summing a list of numbers:
[4, 8, 15, 16, 23, 42].reduce(0) {|acc, elem|
acc + elem
}
So, the accumulator starts out as 0
, and we pass the first element 4
into the +
function. The result is 4
, which becomes the new accumulator. Now we pass the next element 8
in and the result is 12
. And this continues till the last element and the result is that they were dead the whole time. No, wait, the result is 108
.
Ruby actually allows us to take a couple of shortcuts: If the element type is the same as the accumulator type, you can leave out the accumulator and Ruby will simply pass the first element as the first value for the accumulator:
[4, 8, 15, 16, 23, 42].reduce {|acc, elem|
acc + elem
}
Also, we can use Symbol#to_proc
here:
[4, 8, 15, 16, 23, 42].reduce(&:+)
And actually, if you pass reduce
a Symbol
argument it will treat as the name of the function to use for the reduction operation:
[4, 8, 15, 16, 23, 42].reduce(:+)
However, summing is not all that reduce
can do. In fact, I find this example a little dangerous. Everybody I showed this to, immediately understood, "Aah, so that's what a reduce
is", but unortunately some also thought that summing numbers is all reduce
is, and that's definitely not the case. In fact, reduce
is a general method of iteration, by which I mean that reduce
can do anything that each
can do. In particular, you can store arbitrary state in the accumulator.
For example, I wrote above that reduce
reduces the collection down to a single value. But of course that "single value" can be arbitrarily complex. It could, for example, be itself a collection. Or a string:
class Array
def mystery_method(foo)
drop(1).reduce("#{first}") {|s, el| s << foo.to_str << el.to_s }
end
end
This is an example how far you can go with playing tricks with the accumulator. If you try it out, you'll of course recognize it as Array#join
:
class Array
def join(sep=$,)
drop(1).reduce("#{first}") {|s, el| s << sep.to_str << el.to_s }
end
end
Note that nowhere in this "loop" do I have to keep track of whether I'm at the last or second-to-last element. Nor is there any conditional in the code. There is no potential for fencepost errors here. If you think about how to implement this with each
, you would have to somehow keep track of the index and check whether you are at the last element and then have an if
in there, to prevent emitting the separator at the end.
Since I wrote above that all iteration can be done with reduce
, I might just as well prove it. Here's Ruby's Enumerable
methods, implemented in terms of reduce
instead of each
as they normally are. (Note that I only just started and have only arrived at g yet.)
module Enumerable
def all?
reduce(true) {|res, el| res && yield(el) }
end
def any?
reduce(false) {|res, el| res || yield(el) }
end
alias_method :map, def collect
reduce([]) {|res, el| res << yield(el) }
end
def count
reduce(0) {|res, el| res + 1 if yield el }
end
alias_method :find, def detect
reduce(nil) {|res, el| if yield el then el end unless res }
end
def drop(n=1)
reduce([]) {|res, el| res.tap {|res| res << el unless n -= 1 >= 0 }}
end
def drop_while
reduce([]) {|res, el| res.tap {|res| res << el unless yield el }}
end
def each
reduce(nil) {|_, el| yield el }
end
def each_with_index
tap { reduce(-1) {|i, el| (i+1).tap {|i| yield el, i }}}
end
alias_method :select, def find_all
reduce([]) {|res, el| res.tap {|res| res << el if yield el }}
end
def grep(pattern)
reduce([]) {|res, el| res.tap {|res| res << yield(el) if pattern === el }}
end
def group_by
reduce(Hash.new {|hsh, key| hsh[key] = [] }) {|res, el| res.tap {|res|
res[yield el] = el
}}
end
def include?(obj)
reduce(false) {|res, el| break true if res || el == obj }
end
def reject
reduce([]) {|res, el| res.tap {|res| res << el unless yield el }}
end
end
[Note: I made some simplifications for the purpose of this post. For example, according to the standard Ruby Enumerable protocol, each
is supposed to return self
, so you'd have to slap an extra line in there; other methods behave slightly differently, depending on what kind and how many arguments you pass in and so on. I left those out because they distract from the point I am trying to make.]
They're talking about more specialized methods such as map
, filter
or inject
. For example, instead of this:
even_numbers = []
numbers.each {|num| even_numbers << num if num.even?}
You should do this:
even_numbers = numbers.select {|num| num.even?}
It says what you want to do but encapsulates all the irrelevant technical details in the select
method. (And incidentally, in Ruby 1.8.7 or later, you can just write even_numbers = numbers.select(&:even?)
, so even more concise if slightly Perl-like.)
These aren't normally called "higher-order iterators," but whoever wrote that probably just had a minor mental mixup. It's a good principle whatever terminology you use.
From the usual definition of "higer-order" I would say a higher-order iterator is an iterator which takes an iterator as an argument or returns an iterator. So something like enum_for
maybe. However I don't think this is what the person meant.
I think the person meant iterators like map
or select
which are higher-order functions, but did not realize that each
is of course also a higher-order function. So basically this is just a case of terminological confusion.
The point of the poster presumably was that you should not use each
in cases where map
, select
or inject
could naturally be used instead. And to make that point he used a term, that didn't really make sense in that context.
I get this question a fair bit, so I blogged about the most commonly used iterators: select and reject. In the post there are examples of where 'each' is used incorrectly and how to correct the code to use either 'select' or 'reject'. Anyway, I hope it helps.
http://www.natontesting.com/2011/01/01/rubys-each-select-and-reject-methods/
I just wrote a blog that is very relavent to this question - The reason you want to use higher order functions is that doing so elevates the programmer to a higher level of abstraction, to a point that a problem can be expressed declaratively, pushing the implementation down either to the Ruby standard library or lower level code.
http://www.railstutors.com/blog/declarative-thinking-with-higher-order-functions-and-blocks#.UG5x6fl26jJ
精彩评论