开发者

Perl: Can someone explain this code? It involves map, sort, tr and references. (Modified Schwartzian Transform)

I've read tutorials and perldoc on map, tr and references, but this code is a little too advanced for a beginner Perl user like myself.

print map $_->[1], 
sort {
$a->[0] cmp $b->[0] ##first element of the array
or $a->[1] cmp $b->[1] } 
map [ tr/"MATCH"/"MATCH"/, $_ ], @allmatches; 

So what I particularly need is: what $_ is referring to (undefined?)

The last line including map does what?

I don't really understand the $a and $b concepts yet. What are they referring to? The first and next element of @allmatches?

Also, what do all the commas (after map) do? And if this is like a Schwartzian transform, good, because I don't understand that yet, 开发者_运维知识库despite reading.

Here is my idea:

Maps undefined scalar as reference to an array(which?) simultaneously calling the second element: [1]. It sorts my @allmatches array first by amount of occurrences of "MATCH"'s and then by alphabet. The second map by making a reference is rough for me (maps do a lot in one step); the tr returns number of times. The second "MATCH" is useless, but why?

Bonus: What could I replace tr/// with to sort by more, like if this was possible: tr/MATCH #\d+// ??


Reading it right-to-left (i.e., in the order it gets executed)...

map [ tr/"MATCH"/"MATCH"/, $_ ], @allmatches;

For each element e of @allmatches, this creates a reference to a two-element array whose first element is a number and whose second element is e. The result of the map is an array of these references.

tr/"MATCH"/"MATCH"/ is counting the number of times the letters M, A, T, C, or H occur in e. (It is technically replacing M by M, A by A, T by T, etc., and counting how many such replacements it made.)

Actually, it is also counting quote characters, since tr/// is going to process those the same as anything else. This appears to be a bug.

Anyway, let's say each of these references refers to an array [n,e], where n is the weird count and e is the original element of @allmatches.

The "sort" then sorts the array of references, primarily by n (interpreted as a string, not a number; this appears to be another bug) and secondarily by the string e.

Finally, the outermost "map" extracts the second element (e) from each of the two-element arrays after the sorting is done. So the final result is just to do a bizarre (and I believe, buggy) sort on the elements of @allmatches.

[Edit: As cjm points out in a comment, this map sort map idiom is called a Schwartzian transform.]


Don’t read right to left; format it better (the original was atrocious), and then read bottom to top:

print map  { $_->[1] }
      sort {
              $b->[0] <=> $a->[0]
                      ||
              $a->[1] cmp $b->[1]
           }
      map  { [ tr/MATCH// => $_ ] }
      @allmatches;

Or using more flexible hashes instead:

print map  { $_->{DATA} }
      sort {
              $b->{COUNT} <=> $a->{COUNT}
                          ||
              $a->{DATA}  cmp $b->{DATA}
           }
      map  {
             +{
                COUNT  => tr/MATCH//,
                DATA   => $_,
              }
      } @allmatches;

Which is of course the same as this:

print map  {         $$_{DATA}      }
      sort {
              $$b{COUNT} <=> $$a{COUNT}
                          ||
              $$a{DATA}  cmp $$b{DATA}
           }
      map  {
             +{
                  COUNT  => tr/MATCH//,
                  DATA   => $_,
              }
      } @allmatches;

See how very much better that is? Plus when you read it bottom-to-top, it corresponds to a shell-style data flow that’s prefectly straightforward:

  map @allmatches | sort | map | print

Which is a lot easier to understand than

  print(map(sort(map @allmatches)))

and is the reason why everybody prefers the shell's dataflow model.


Ouch, and likewise yuck...

print map $_->[1], 
            sort {
          $a->[0] cmp $b->[0] ##first element of the array
          or $a->[1] cmp $b->[1] } 
      map [ tr/"MATCH"/"MATCH"/, $_ ], @allmatches;

The sort part is relatively straight-forward.

sort { $a->[0] cmp $b->[0] or $a->[1] cmp $b->[1] } ...an array...

Each element of the array is itself an array ref, and the comparison does a string compare (cmp) of the first elements of the array refs, and if those are equal (cmp returns 0), the second elements.

The output is a sorted array, therefore. That leaves two chunks of code to dissect. The first line and the last line. The last line runs map:

map [ tr/"MATCH"/"MATCH"/, $_ ], @allmatches

This is apparently doing a no-op transform since the left and right strings in the tr/// operator are the same; that's kinda puzzling. [Update: the tr/// counts the number of times each of the letters MATCH appears in the string; within the 'block' or 'expr' of map, $_ is a special variable - the value being mapped.] But it takes each element of @allmatches and maps it, and the output of that is passed to the sort. The square brackets form an array ref, so the output is an array of array refs; each array reference contains a count of the number of letters from MATCH in the word, followed by the word.

The first line is then:

print map $_->[1], ...output from sort...;

This extracts the name $_->[1] from the sorted output.

  • Overall, the effect is to list the words in @allmatches in an order such that the ones with the fewest (possibly zero) letters from MATCH appear first, in alphabetic order, followed by the ones with the next fewest letters from MATCH (in alphabetic order again), and so on.

It is a tour de force in compression. If someone provided it to me to review, they'd be walking back to the drawing board. (Update: Since this is a known idiom (a Schwartzian Transform), the only reasons for sending it back are 'not laid out carefully enough' and 'not annotated as Schwartzian Transform'.)

# Schwartzian Transform: sort by number of letters from MATCH and alphabetically
print map  { $_->[1] } 
      sort { $a->[0] <=> $b->[0] or $a->[1] cmp $b->[1] } 
      map  { [ tr/"MATCH"/"MATCH"/, $_ ] }
      @allmatches;

(This correctly uses a numeric comparison for the first term.)

You mention being confused about $a and $b. They are basically magic variables - the parameters to the comparison function in the sort. The comparison must return a negative value if $a compares less than $b, or positive if $a compares greater than $b, or zero if they compare equal. They ($a and $b) are the names used when two names are required; $_ is used with map (and grep and other list transform functions) where there's just one name needed.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜