开发者

How bad is my regex?

Ok so I managed to solve a problem at work with regex, but the solution is a bit of a monster.

The string to be validated must be:

zero or more: A-Z a-z 0-9, spaces, or these symbols: . - = + ' , : ( ) /

But, the first and/or last characters must not be a forward slash /

This was my solution (used preg_match php function):

"/^[a-z\d\s\.\-=\+\',:\(\)][a-z\d\s\.\-=\+\',\/:\(\)]*[a-z\d\s\.\-=\+\',:\(\)]$|^[a-z\d\s\.\-=\+\',:\(\)]$/i"

A col开发者_开发知识库league thinks this is too big and complicated. Well it works, so is it really that bad? Anyone in the mood for some regex-golf?


You can simplify your expression to this:

/^(?:[a-z\d\s.\-=+',:()]+(?:/+[a-z\d\s.\-=+',:()]+)*)?$/i

The outer (?:…)? is to allow an empty string. The [a-z\d\s.\-=+',:()]+ allows to start with one or more of the specified characters except the /. If a / follows, it also must be followed by one or more of the other specified characters ((?:/[a-z\d\s.\-=+',:()]+)*).

Furthermore, inside a character set, you only need to escape the characters \, ], and depending on the position also ^ and -.


Try something like this instead

function validate($string) {
   return (preg_match("/[a-zA-Z0-9.\-=+',:()/]*/", $string) && substr($string, 0,1) != '/' && substr($string, -1) != '/'))
}

It's a lot simpler to check the first and last character specifically. Otherwise you're left with dealing with a lot of overhead when it comes to empty strings and such. Your regex, for example, requires the string to be at least one character long, otherwise it doesn't validate. Despite "" fitting your criteria.


'#^(?!/)[a-z\d .=+\',:()/-]*$(?<!/)#i'

As others have observed, most of those characters don't need to be escaped inside a character class. Additionally, the hyphen doesn't need to be escaped if it's the last thing listed, and the slash doesn't need to be escaped if you use a different character as the regex delimiter (# in this case, but ~ is a popular choice, too).

I also ditched the double-quotes in favor of single-quotes, which meant I had to escape the single-quote in the regex. That's worth it because single-quoted strings are so much simpler to work with: no $variable interpolation, no embedded executable {code}, and the only characters you have to escape for them are the single-quote and the backslash.

But the main innovation here is the use of lookahead and lookbehind to exclude the slash as the first or last character. That's not just a code-golf tactic, either; I would write the regex this way anyway, because it expresses my intent so much better. Why force the next guy to parse those almost-identical character classes, when you can just say what you mean? "...but the first and last character can't be slashes."

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜