开发者

Why does this regular expression not match adjacent occurences of newline?

I was trying to write a regexp to replace all occurrences of \n with \r\n unless the \n is already preceeded immediately by a \r. I'm doing this in Ruby 1.8.6 which doesn't support look behind in regexps so I tried:

# try to replace \n proceeded by anything other than \r with \r\n
str.gsub(/([^\r])\n/, "\\1\r\n") # \\1 is the captured character to be kept

Given a string One\n\nTwo\r\nThree the intention was for \n\n to be replaced with开发者_开发技巧 \r\n\r\n and the existing \r\n between Two and Three to be left unmodified. However only the first of the two \n match. i.e. the result is:

 "One\r\n\nTwo\r\nThree"

I tried this in a couple of other regexp engines with the same result.

In the end I was able to solve this by using Oniguruma (which does support positive and negative look behind) instead of Ruby's built in regexps but I am still interested in why my alternative approach didn't work as I expected.

Thanks for any answers.


Just writing to explain (rsp's comment says the same thing) why your original regex didn't work. The regex engine first matches ([^\r])\n at the ^ characters:

One\r\n\nTwo\r\nThree
   ^^^^

After the first replacement, the regex engine is at the ^:

One\r\n\nTwo\r\nThree
       ^

It's now trying to match ([^\r])\n, but there is no character that is not \n at the caret position. So it won't match until the \r\n between Two and Three.


You could simply replace \r?\n with \r\n:

s = s.gsub(/\r?\n/, "\r\n")

That way, all \r\n's and \n's are replaced by \r\n.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜