Backslash + captured group within Ruby regular expression
How do 开发者_运维知识库I excape a backslash before a captured group?
Example:
"foo+bar".gsub(/(\+)/, '\\\1')
What I expect (and want):
foo\+bar
what I unfortunately get:
foo\\1bar
How do I escape here correctly?
As others have said, you need to escape everything in that string twice. So in your case the solution is to use '\\\\\1'
or '\\\\\\1'
. But since you asked why, I'll try to explain that part.
The reason is that replacement sequence is being parsed twice--once by Ruby and once by the underlying regular expression engine, for whom \1
is its own escape sequence. (It's probably easier to understand with double-quoted strings, since single quotes introduce an ambiguity where '\\1'
and '\1'
are equivalent but '\'
and '\\'
are not.)
So for example, a simple replacement here with a captured group and a double quoted string would be:
"foo+bar".gsub(/(\+)/, "\\1") #=> "foo+bar"
This passes the string \1
to the regexp engine, which it understands as a reference to a capture group. In Ruby string literals, "\1"
means something else entirely (ASCII character 1).
What we actually want in this case is for the regexp engine to receive \\\1
. It also understands \
as an escape character, so \\1
is not sufficient and will simply evaluate to the literal output \1
. So, we need \\\1
in the regexp engine, but to get to that point we need to also make it past Ruby's string literal parser.
To do that, we take our desired regexp input and double every backslash again to get through Ruby's string literal parser. \\\1
therefore requires "\\\\\\1"
. In the case of single quotes one slash can be omitted as \1
is not a valid escape sequence in single quotes and is treated literally.
Addendum
One of the reasons this problem is usually hidden is thanks to the use of /.+/
style regexp quotes, which Ruby treats in a special way to avoid the need to double escape everything. (Of course, this doesn't apply to gsub
replacement strings.) But you can still see it in action if you use a string literal instead of a regexp literal in Regexp.new
:
Regexp.new("\.").match("a") #=> #<MatchData "a">
Regexp.new("\\.").match("a") #=> nil
As you can see, we had to double-escape the .
for it to be understood as a literal .
by the regexp engine, since "."
and "\."
both evaluate to .
in double-quoted strings, but we need the engine itself to receive \.
.
This happens due to a double string escaping. You should use 5 slashes in this case.
"foo+bar".gsub(/([+])/, '\\\\\1')
Adding \
two more times escapes this properly.
irb(main):011:0> puts "foo+bar".gsub(/(\+)/, '\\\\\1')
foo\+bar
=> nil
精彩评论