开发者

Ruby regex question wrt the sub method on String

I'm running through the Koans tutorial (which is a great way to learn) and I've encountered this statement:

assert_equal __, "one two-three".sub(/(t\w*)/) { $1[0, 1] }

In this statement the __ is where I'm supposed to put my expected result to make the test execute correctly. I have stared at this for a while and have pulled most of it apart but 开发者_开发百科I cannot figure out what the last bit means:

{ $1[0, 1] }

The expected answer is:

"one t-three"

and I was expecting:

"t-t"


{ $1[0, 1] } is a block containing the expression $1[0,1]. $1[0,1] evaluates to the first character of the string $1, which contains the contents of the first capturing group of the last matched regex.

When sub is invoked with a regex and a block, it will find the first match of the regex, invoke the block, and then replace the matched substring with the result of the block.

So "one two-three".sub(/(t\w*)/) { $1[0, 1] } searches for the pattern t\w*. This finds the substring "two". Since the whole thing is in a capturing group, this substring is stored in $1. Now the block is called and returns "two"[0,1], which is "t". So "two" is replaced by "t" and you get "one t-three".

An important thing to note is that sub, unlike gsub, only replaces the first occurrence, not ever occurrence of the pattern.


@sepp2k already gave a really good answer, I just wanted to add how you could have used IRB to maybe get there yourself:

>> "one two-three".sub(/(t\w*)/) { $1 } #=> "one two-three"
>> "one two-three".sub(/(t\w*)/) { $1[0] } #=> "one t-three"
>> "one two-three".sub(/(t\w*)/) { $1[1] } #=> "one w-three"
>> "one two-three".sub(/(t\w*)/) { $1[2] } #=> "one o-three"
>> "one two-three".sub(/(t\w*)/) { $1[3] } #=> "one -three"
>> "one two-three".sub(/(t\w*)/) { $1[0,3] } #=> "one two-three"
>> "one two-three".sub(/(t\w*)/) { $1[0,2] } #=> "one tw-three"
>> "one two-three".sub(/(t\w*)/) { $1[0,1] } #=> "one t-three"


Cribbing from the documentation (http://ruby-doc.org/core/classes/String.html#M001185), here are answers to your two questions "why is the return value 'one t-three'" and "what does { $1[0, 1] } mean?"

What does { $1[0, 1] } mean? The method String#sub can take either two arguments, or one argument and a block. The latter is the form being used here and it's just like the method Integer.times, which takes a block:

5.times { puts "hello!" }

So that explains the enclosing curly braces.

$1 is the substring matching the first capture group of the regex, as described here. [0, 1] is the string method "[]" which returns a substring based on the array values - here, the first character.

Put together, { $1[0, 1] } is a block which returns the first character in $1, where $1 is the substring to have been matched by a capture group when a regex was last used to match a string.

Why is the return value 'one t-three'? The method String#sub ('substitute'), unlike its brother String#gsub ('globally substitute'), replaces the first portion of the string matching the regex with its replacement. Hence the method is going to replace the first substring matching "(t\w*)" with the value of the block described above - i.e. with its first character. Since 'two' is the first substring matching (t\w*) (a 't' followed by any number of letters), it is replaced by its first character, 't'.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜