How do I fix this multiline regular expression in Ruby?
I have a regular expression in Ruby that isn't working properly in multiline mode.
I'm trying to convert Markdown t开发者_开发问答ext into the Textile-eque markup used in Redmine. The problem is in my regular expression for converting code blocks. It should find any lines leading with 4 spaces or a tab, then wrap them in pre tags.
markdownText = '# header
some text that precedes code
var foo = 9;
var fn = function() {}
fn();
some post text'
puts markdownText.gsub!(/(^(?:\s{4}|\t).*?$)+/m,"<pre>\n\\1\n</pre>")
Intended result:
# header
some text that precedes code
<pre>
var foo = 9;
var fn = function() {}
fn();
</pre>
some post text
The problem is that the closing pre tag is printed at the end of the document instead of after "fn();". I tried some variations of the following expression but it doesn't match:
gsub!(/(^(?:\s{4}|\t).*?$)+^(\S)/m, "<pre>\n\\1\n</pre>\\2")
How do I get the regular expression to match just the indented code block? You can test this regular expression on Rubular here.
First, note that 'm'
multi-line mode in Ruby is equivalent to 's'
single-line mode of other languages. In other words; 'm'
mode in Ruby means: "dot matches all".
This regex will do a pretty good job of matching a markdown-like code section:
re = / # Match a MARKDOWN CODE section.
(\r?\n) # $1: CODE must be preceded by blank line
( # $2: CODE contents
(?: # Group for multiple lines of code.
(?:\r?\n)+ # Each line preceded by a newline,
(?:[ ]{4}|\t).* # and begins with four spaces or tab.
)+ # One or more CODE lines
\r?\n # CODE folowed by blank line.
) # End $2: CODE contents
(?=\r?\n) # CODE folowed by blank line.
/x
result = subject.gsub(re, '\1<pre>\2</pre>')
This requires a blank line before and after the code section and allows blank lines within the code section itself. It allows for either \r\n
or \n
line terminations. Note that this does not strip the leading 4 spaces (or tab) before each line. Doing that will require more code complexity. (I am not a ruby guy so can't help out with that.)
I would recommend looking at the markdown source itself to see how its really being done.
/^(\s{4}|\t)+.+\;\n$/m
works a little better, still picks up a newline that we don't want. here it is on rubular.
This is working for me with your sample input.
markdownText.gsub(/\n?((\s{4}.+)+)/, "\n<pre>#{$1}\n</pre>")
Here's another one that captures all the indented lines in a single block
((?:^(?: {4}|\t)[^\n]*$\n?)+)
精彩评论