What is a regex for Twitter-like names?
I have been coding for a while but never had the need for regular expressions until recently. I need to do a regular expression that accepts usernames as Twitter does. Basically, I want to allow one underscore at a time. There can be more than one underscore in a name but these should not be consecutive characters. Alphanumeric characters are also allowed. But numbers cannot start a name.
Names such as
- _myname67
- my开发者_运维百科name67
- my_name
- _my_67_name_
are valid but
- 94myname
- __myname
- my__name
- my name
are not valid.
I have played with Rubular and come up with a couple regexes:
/^[^0-9\s+](_?[a-z0-9]+_?)+$/i
/^([a-z_?])+$/i
The problem I keep running into is that these match more than one underscores.
Edited
a = %w[
_myname67
myname67
my_name
_my_67_name_
94myname
__myname
my__name
my\ name
m_yname
]
p a.select{|name| name =~ /\A_?[a-z]_?(?:[a-z0-9]_?)*\z/i}
# => ["_myname67", "myname67", "my_name", "_my_67_name_", "m_yname"]
You should use ( )
only for substrings that you want to capture. (?: )
is used for groupings that you do not want to capture. It is a good practice to use it whenever you do not need to refer particularly to that substring. It also makes the regex run faster.
Try the following ^([a-zA-Z](_?[a-zA-Z0-9]+)*_?|_([a-zA-Z0-9]+_?)*)$
I've separated two cases: the word starts with a letter, and starts with an underscore. If you don't want to allow names consisting of one symbol only replace the *
with +
.
maerics's solution has one problem, it doesn't capture names that have _
on the second place, such as m_yname
Some things are really hard to express using only regular expressions, and are generally write-only (that is, there's no way to read and understand them lately). You can use a simpler regexp (like the two ones you managed to write) and check for double underscores in your Ruby code. It doesn't hurt:
if username =~ /^[^0-9](_?[a-z0-9]+_?)+$/i and username.count('__') == 0 then ...
This seems to work:
/^(_|([a-z]_)|[a-z])([a-z0-9]+_?)*$/i
Updates: corrected for numeral constraints and case.
/^[A-Za-z_]([A-Za-z0-9]+_?)+$/
Some problems can't be solved with just one regular expression... especially when you want to check for the absence of a pattern as well as the presence of another pattern.
Sometimes it is better (and definitely more readable) to break your conditions down into multiple regular expressions and match against each of them in turn.
In addition to your regular expressions to check for valid characters, you should also use a regular expression to check for the presence of two underscores, and then INVERT that result (that is, throw out the name if it MATCHES the pattern).
精彩评论