开发者

Code formatting - approach to mitigate unreadably large regex?

I'm using a long regular expression which is pretty hard to grok if you didn't write it in the previous 5 minutes -

"/([^\s]+)\s*[^\[]+\[([^\]]+)\]\s*"([^\s]+)\s*([^\s]+)\s*([^"]+)"\s*([^\s]+)\s*([^\s]+)         \s*"([^"]+)"\s*"([^"]+)"/

Is there a commonly adopted way of formatting long regular expressions in code that makes for better readability?

I thought of putting each capture group on its own line, e.g.

          /([^\s]+)
\s*[^\[]+\[([^\]]+)
     \]\s*"([^\s]+)
        \s*([^\s]+)
         \s*([^"]+)
       "\s*([^\s]+)
        \s*([^\s]+)
        \s*"([^"]+)
       "\s*"([^"]+)"/

This would be excellent if I could put comments line by line on each section of the regex, but Ruby won't let me.

I'm more interested in the general question of wha开发者_C百科t to do with big regex than in better ways to parse text... this particular case was just part of an exercise I set myself while learning a bit of Ruby.


Just use the x flag (Which means ignore whitespace).

And then you can also put comments. See example:

          /([^\s]+) #Matches 1+ not whitespace.
\s*[^\[]+\[([^\]]+) #Matches 0+whitespace and an open bracket "["
     \]\s*"([^\s]+) #Matches a closing brack, space and and an open ", and some text
        \s*([^\s]+) #Matches
         \s*([^"]+)
       "\s*([^\s]+)
        \s*([^\s]+)
        \s*"([^"]+)
       "\s*"([^"]+)"/x =~ 'ss[s] "ss" " " dd dd "sdf" "  df"sdfasdf'        

print Regexp.last_match  #=> ss[s] "ss" " " dd dd "sdf" "  df"

See: http://codepad.org/PDSxQUQf


You could use this:

class Regexp
    def +(re)
      Regexp.new self.source + re.source
    end
end

To enable the '+' operator to concatenate Regex expressions:

           /([^\s]+)/ + # Comment
/\s*[^\[]+\[([^\]]+)/ + # Comment
     /\]\s*"([^\s]+)/ + # Comment
        /\s*([^\s]+)/ + # Comment
         /\s*([^"]+)/ + # Comment
       /"\s*([^\s]+)/ + # Comment
        /\s*([^\s]+)/ + # Comment
        /\s*"([^"]+)/ + # Comment
      /"\s*"([^"]+)"/   # Comment


As others posted, I generally set the flag to ignore pattern white space. In addition to allowing multi-line regex and comments, it allows you to separate your regex by logical grouping or function.

Example:

/([^\s]+)
\s*  [^\[]+   \[ ( [^\]]+) \]
\s*           "  ( [^\s]+)    \s*  ([^\s]+)  \s* ([^"]+) "
\s*              ( [^\s]+)    \s*  ([^\s]+)
\s*           "  ( [^"]+ ) "
\s*           "  ( [^"]+ ) "/

Structure can make all the difference in the world, sometimes even more than comments. When writing for readability, imho the layout of your expression should reflect its purpose just as much as the expression itself. Otherwise, it'll be a pain to read no matter what your comments say.

It can also be helpful to do this for expressions you've inherited, because things will really jump out at you. (I had no idea you were pairing quotes or brackets until I created the above, for example)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜