Why does this regular expression not match adjacent occurences of newline?
I was trying to write a regexp to replace all occurrences of \n
with \r\n
unless the \n
is already preceeded immediately by a \r
. I'm doing this in Ruby 1.8.6 which doesn't support look behind in regexps so I tried:
# try to replace \n proceeded by anything other than \r with \r\n
str.gsub(/([^\r])\n/, "\\1\r\n") # \\1 is the captured character to be kept
Given a string One\n\nTwo\r\nThree
the intention was for \n\n
to be replaced with开发者_开发技巧 \r\n\r\n
and the existing \r\n
between Two and Three to be left unmodified. However only the first of the two \n
match. i.e. the result is:
"One\r\n\nTwo\r\nThree"
I tried this in a couple of other regexp engines with the same result.
In the end I was able to solve this by using Oniguruma (which does support positive and negative look behind) instead of Ruby's built in regexps but I am still interested in why my alternative approach didn't work as I expected.
Thanks for any answers.
Just writing to explain (rsp's comment says the same thing) why your original regex didn't work. The regex engine first matches ([^\r])\n
at the ^
characters:
One\r\n\nTwo\r\nThree
^^^^
After the first replacement, the regex engine is at the ^
:
One\r\n\nTwo\r\nThree
^
It's now trying to match ([^\r])\n
, but there is no character that is not \n
at the caret position. So it won't match until the \r\n
between Two
and Three
.
You could simply replace \r?\n
with \r\n
:
s = s.gsub(/\r?\n/, "\r\n")
That way, all \r\n
's and \n
's are replaced by \r\n
.
精彩评论