How does Array#sort work when a block is passed?
I am having a problem understanding how array.sort{ |x,y| block }
works exactly, hence how to use it?
An example from Ruby documentation:
a = [ "d", "a", "e", "c", "b" ]
a.sort 开发者_如何学C #=> ["a", "b", "c", "d", "e"]
a.sort { |x,y| y <=> x } #=> ["e", "d", "c", "b", "a"]
In your example
a.sort
is equivalent to
a.sort { |x, y| x <=> y }
As you know, to sort an array, you need to be able to compare its elements (if you doubt that, just try to implement any sort algorithm without using any comparison, no <
, >
, <=
or >=
).
The block you provide is really a function which will be called by the sort
algorithm to compare two items. That is x
and y
will always be some elements of the input array chosen by the sort
algorithm during its execution.
The sort
algorithm will assume that this comparison function/block will meet the requirements for method <=>
:
- return -1 if x < y
- return 0 if x = y
- return 1 if x > y
Failure to provide an adequate comparison function/block will result in array whose order is undefined.
You should now understand why
a.sort { |x, y| x <=> y }
and
a.sort { |x, y| y <=> x }
return the same array in opposite orders.
To elaborate on what Tate Johnson added, if you implement the comparison function <=>
on any of your classes, you gain the following
- You may include the module
Comparable
in your class which will automatically define for you the following methods:between?
,==
,>=
,<
,<=
and>
. - Instances of your class can now be sorted using the default (ie without argument) invocation to
sort
.
Note that the <=>
method is already provided wherever it makes sense in ruby's standard library (Bignum
, Array
, File::Stat
, Fixnum
, String
, Time
, etc...).
When you have an array of, let's say, integers to sort, it's pretty straightforward for sort
method to order the elements properly - smaller numbers first, bigger at the end. That's when you use ordinary sort
, with no block.
But when you are sorting other objects, it may be needed to provide a way to compare (each) two of them. Let's say you have an array of objects of class Person
. You probably can't tell if object bob
is greater than object mike
(i.e. class Person
doesn't have method <=>
implemented). In that case you'd need to provide some code to explain in which order you want these objects sorted to sort
method. That's where the block form kicks in.
people.sort{|p1,p2| p1.age <=> p2.age}
people.sort{|p1,p2| p1.children.count <=> p2.children.count}
etc. In all these cases, sort
method sorts them the same way - the same algorithm is used. What is different is comparison logic.
@OscarRyz reply cleared up a lot for me on the question on how the sort works, esp
{ |x, y| y <=> x }
Based on my understanding I am providing here what the state of the array would be after each comparison for above block results.
Note: Got the reference of printing the values of block paramaters e1, e2 from ruby-forum
1.9.3dev :001 > a = %w(d e a w f k)
1.9.3dev :003 > a.sort { |e1, e2| p [e2, e1]; e2 <=> e1 }
["w", "d"]
["k", "w"]
["k", "d"]
["k", "e"]
["k", "f"]
["k", "a"]
["f", "a"]
["d", "f"]
["d", "a"]
["d", "e"]
["e", "f"]
=> ["w", "k", "f", "e", "d", "a"]
A guessed array state at runtime after each comparison:
[e2, e1] Comparsion Result Array State
["w", "d"] 1 ["w", "e", "a", "d", "f", "k"]
["k", "w"] -1 ["w", "e", "a", "d", "f", "k"]
["k", "d"] 1 ["w", "e", "a", "k", "f", "d"]
["k", "e"] 1 ["w", "k", "a", "e", "f", "d"]
["k", "f"] 1 ["w", "k", "a", "e", "f", "d"]
["k", "a"] 1 ["w", "k", "a", "e", "f", "d"]
["f", "a"] 1 ["w", "k", "f", "e", "a", "d"]
["d", "f"] -1 ["w", "k", "f", "e", "a", "d"]
["d", "a"] 1 ["w", "k", "f", "e", "d", "a"]
["d", "e"] -1 ["w", "k", "f", "e", "d", "a"]
["e", "f"] -1 ["w", "k", "f", "e", "d", "a"] (Result)
Thanks,
Jignesh
<=>
is a method is ruby that returns ( self.<=>( argument )
)
- -1 if self < argument
- 0 if self == argument
- 1 if self > argument
x
and y
are items of array. If no block is provided, the sort
function uses x<=>y
, otherwise the result of the block says if x should be before y.
array.sort{|x, y| some_very_complicated_method(x, y) }
Here if some_very_complicated_method(x, y) returns smth that is < 0, x is considered < than y and so on...
Some miscellaneous points:
x
andy
are called block parameters. The sort method basically says "I'll give you x and y, you determine whether x or y should come first, and I'll look after the boring stuff with regards to sorting"<=>
is called a spaceship operator.
In:
a.sort {|x,y| y <=> x } #=> ["e", "d", "c", "b", "a"]
what is x and y?
x
and y
are the elements being compared by the sorting algorithm.
This is useful to define for custom classes which element should be before the other.
For basic data ( numbers, strings , date, etc ) the natural order is predefined, but for customer element ( ie Employee ) you define who goes before who in a comparison. This block give you the chance to define that.
and what happens at y<=>x?
There, they are comparing the elements in descending order ( those with "higher" value will go first ) rather than the natural order ( x<=>y
)
The <=>
method stands for "compareTo" and return 0 if the elements are equivalent, or <
0 if x
goes before than y
or >
0 if x
goes after y
I believe |x,y| y<=>x is comparing two elements at a time in descending order, as seen in: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-3C-3D-3E Say with [ "d", "a", "e", "c", "b" ], "d" and "a" appear to be compared first. Then since it is descending, both remain in the same order because d evaluates to less than a. Then d and e are evaluated. "e" is moved to "d"'s position. Without knowing the internal workings of the c code it is not possible to know where is d moved to but I figure this process continues until all elements are sorted. The c functions:
VALUE
rb_ary_cmp(VALUE ary1, VALUE ary2)
{
long len;
VALUE v;
ary2 = rb_check_array_type(ary2);
if (NIL_P(ary2)) return Qnil;
if (ary1 == ary2) return INT2FIX(0);
v = rb_exec_recursive_paired(recursive_cmp, ary1, ary2, ary2);
if (v != Qundef) return v;
len = RARRAY_LEN(ary1) - RARRAY_LEN(ary2);
if (len == 0) return INT2FIX(0);
if (len > 0) return INT2FIX(1);
return INT2FIX(-1);
}
精彩评论