开发者

How does Ruby's sort_by {rand} work?

I think this is a great Ruby one-liner:

someArray.sort_by {rand}

It's concise, it's readable, and it works - but I don't quite understand how. Here's what I know:

  1. rand evaluates to a number between 0 and 1 (like 0.783468632804653)
  2. rand is being repeatedly evaluated in the code above, because assigning it to x first breaks the random sort
  3. sort_by {0.783468632804653}, or any other number I tried, has no effect on the array

ruby-doc.org wasn't much help to me in this case.

Can someone explain this step-by-step?

Update

I've been using Ruby longer now, 开发者_运维百科and I see that I was missing a concept or two here. The key thing is that:

  1. rand is a method (defined on Kernel); it generates a random number
  2. {rand} is a block, which sort_by keeps, calling it each time it wants to compare two items in the collection. If the collection is a bunch of objects representing countries, it needs to be able to grab two of them and determine which one comes first. Do you put the one with the longest name first? The one with the largest land mass? The block should answer that question by returning a value that says "you asked about Spain vs Cameroon, and I say Cameroon comes first." (You could do that with {|country| country.name.length}

The rest of how sort_by works is explained in the documentation. I'm still not quite sure why returning a random number works at all - presumably sort_by rounds it to -1, 0, or 1, whichever is closest? But in any case, getting a different random number every time you call the block is quite different from getting the same number every time. When sort_by says "which of these two countries comes first?", {rand} puts on a blindfold, turns around 10 times, points and says "that one!" :)


In Ruby 1.8/1.9 both sort and sort_by are implemented in C, this is a rough equivalent of how this works:

Say you start with [1,2,3,4] and call sort_by{rand}:

  1. (I invented some random numbers):

    An array of tuples is created: [[0.12232, 1],[0.53434, 2],[0.333, 3],[0.99, 4]]

    In roughly equivalent Ruby code this is: [1,2,3,4].map{|x| [rand, x]}

  2. Ruby's quick sort is performed on the array based off the first element: (note the internal implementation is far from trivial and contains a ton of optimisations for already ordered arrays and such)

    [[0.12232, 1],[0.333, 3],[0.53434, 2],[0.99, 4]]
    

    In rough Ruby this step is: ary.sort{|x,y| x[0] <=> y[0]}

  3. Pointers are copied from the new sorted array, to the correct position in the original array.

    [1,3,2,4]
    

    In rough Ruby this step is: ary.map{|x,y| y}

This technique is sometimes referred to as a "Schwartzian Transform". Caching means that the expensive operation is not performed more than N times. Meaning, this is a very efficient way of randomizing an array.

Note: array.shuffle! will be the most efficient built-in way to shuffle an array (in-place) since it uses a modern version of Fisher-Yates:

static VALUE
rb_ary_shuffle_bang(VALUE ary)
{
    long i = RARRAY_LEN(ary);

    rb_ary_modify(ary);
    while (i) {
  long j = rb_genrand_real()*i;
  VALUE tmp = RARRAY_PTR(ary)[--i];
  RARRAY_PTR(ary)[i] = RARRAY_PTR(ary)[j];
  RARRAY_PTR(ary)[j] = tmp;
    }
    return ary;
}


The block rand yields a key that is used in sorting. It's different each time it's evaluated, so you get a random order.

When you put a single number in there, it's the same each time, so the order doesn't change. That means the sorting algorithm is 'stable' - it doesn't move in-order entries.

And here's some even shorter, even clearer code:

someArray.shuffle


sort_by is a refinement of sort, which is used like so:

people.sort do |person1, person2|
  person1 <=> person2
end

The sort function yields to the block when it needs to know the order of two things, in this case, people. The block returns -1 if the left thing is less than the right thing, 0 if they are equal, and 1 if the right thing is larger than the left thing. The spaceship operator <=> has the wonderful property that it returns -1, 0 or +1, exactly what sort needs.

I haven't looked, but odds are good that Ruby is using the quicksort algorithm.

Some smart person noticed that we kept doing the same thing on the left side of the spaceship operator as we do on the right side, and came up with sort_by, used like so:

people.sort_by do |person|
  person.name
end

Instead of the sort algorithm giving two objects to the block and letting the block compare them, the algorithm gives a single object to the block. The block then returns whatever attribute or value should be used to do the sort. Ruby remembers the value the block returned for each element, and comparing those values, knows what order to put things in. It's neat that you don't have to repeat yourself anymore.

Your shuffle code works by just "making stuff up" when the sort algorithm yields to the block. Instead of returning something sensible, the block yields a random value. This causes the sort algorithm to sort things, well, randomly.


What sort_by does can be split up into two simple steps:

  1. It calls the map/collect method on the provided array and with the provided block. In your case the result of it would be just an array of random numbers - let's call this intermediate array A1. Note, that it has the length of the initial array.

  2. A1 gets sorted normally, but what is returned is not the sorted A1, but rather the original array, where the items are moved around the same way as the corresponding ones from A1, while it's being sorted!

That's how the following example works:

["Paulo", "Sergito", "Nick"].sort_by {|word| word.length} 

It sorts the words by their length, because first the array of words is mapped into an array of lengths, and those lengths are then sorted, while the words in the original array are moved around correspondingly.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜