开发者

Regex for [a-zA-Z0-9\-] with dashes allowed in between but not at the start or end

Update:

This question was an epic failure, but here's the working solution. It's based on Gumbo's answer (Gumbo's was close to working so I chose it as the accepted answer):

Solution:

r'(?=[a-zA-Z0-9\-]{4,25}$)^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$'

Original Question (albeit, after 3 edits)

I'm using Python and I'm not trying to extract the value, but rather test to make sure it fits the pattern.

allowed 开发者_如何学运维values:

spam123-spam-eggs-eggs1
spam123-eggs123
spam
1234
eggs123

Not allowed values:

eggs1-
-spam123
spam--spam

I just can't have a dash at the starting or the end. There is a question on here that works in the opposite direction by getting the string value after the fact, but I simply need to test for the value so that I can disallow it. Also, it can be a maximum of 25 chars long, but a minimum of 4 chars long. Also, no 2 dashes can touch each other.

Here's what I've come up with after some experimentation with lookbehind, etc:

# Nothing here


Try this regular expression:

^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$

This regular expression does only allow hyphens to separate sequences of one or more characters of [a-zA-Z0-9].


Edit    Following up your comment: The expression (…)* allows the part inside the group to be repeated zero or more times. That means

a(bc)*

is the same as

a|abc|abcbc|abcbcbc|abcbcbcbc|…

Edit    Now that you changed the requirements: As you probably don’t want to restrict each hyphen separated part of the words in its length, you will need a look-ahead assertion to take the length into account:

(?=[a-zA-Z0-9-]{4,25}$)^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$


The current regex is simple and fairly readable. Rather than making it long and complicated, have you considered applying the other constraints with normal Python string processing tools?

import re

def fits_pattern(string):
    if (4 <= len(string) <= 25 and
        "--" not in string and
        not string.startswith("-") and
        not string.endswith("-")):

        return re.match(r"[a-zA-Z0-9\-]", string)
    else:
        return None


It should be something like this:

^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$

You are telling it to look for only one char, either a-z, A-Z, 0-9 or -, that is what [] does.

So if you do [abc] you will match only "a", or "b" or "c". not "abc"

Have fun.


If you simply don't want a dash at the end and beginning, try ^[^-].*?[^-]$

Edit: Bah, you keep changing it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜