regular expression for hyphen seperated strings
I created a regular expression for the format XX-XX-XX-XX-XX, where XX is a alphanumeric.
Regular expression is ^[a-z0-9A-Z]{2}-[a-z0-9A-Z]{2}-[a-z0-9A-Z]{2}-[a-z0-9A-Z]{2}$
.
But what I really want to do is to match the below patterns. My string should have one hyphen (-) for each 2 characters.
exapmle 1 : XX- OK
exapmle 2 : XX-X OK
exapmle 3 : XX-XX- 开发者_如何学运维 OK
exapmle 4 : XX-XX-XX OK
exapmle 5 : XX-XX-XX-X OK
exapmle 6 : XX-XX-X OK
exapmle 7 : XX-XX-- NOT OK
exapmle 8 : XX-XX-X- NOT OK
This will do the trick. You basically want any number (zero or more) of XX-
followed by zero, one or two X
:
^([0-9A-Za-z]{2}-)*[0-9A-Za-z]{0,2}$
The match needs to start with a match of any number of XX-
strings:
^([A-Za-z0-9]{2}-)*
Depending on the regexp engine you're using, you may be able to use the somewhat more concise [[:alnum:]]
here. Note that [\w\d]
as originally posted is inappropriate for a couple of reasons; see Alan Moore's comment for details.
Getting the last bit is surprisingly difficult, because you have to nest conditional elements. I.E. the final hyphen only matches if the preceding X
matches, and that X
only matches if the first one does.
Note that this approach assumes that you're not limiting the number of XX-
segments. In particular, note that it will match XX-XX-XX-XX-XX-
. You can limit the number of XX-
segments pretty easily, but getting it to not match a hyphen after the fifth XX
is a little more complicated.
Anyway, back to the explanation. A following X
is okay:
^([A-Za-z0-9]{2}-)*([A-Za-z0-9])?
It's also okay if it is followed by another X
:
^([A-Za-z0-9]{2}-)*([A-Za-z0-9]([A-Za-z0-9])?)?
And a final -
is also okay (assuming that it's preceded by XX
):
^([A-Za-z0-9]{2}-)*([A-Za-z0-9]([A-Za-z0-9]-?)?)?
Finally, append $
to specify that it should take up the whole line:
^([A-Za-z0-9]{2}-)*([A-Za-z0-9]([A-Za-z0-9]-?)?)?$
I've forked SeanA's jsfiddle. Thanks, Sean!
update
Thanks to Alan Moore's great job "watching the watchmen" (see the comments), I realized that you can do this quite a bit more simply with
/^([A-Za-z0-9]{2}-)*[A-Za-z0-9]{0,2}$/
An updated fiddle for that RE.
Here you are saying that there can be up to two X
s at the end of a series of XX-
segments. This works because if there is a hyphen at the end, it will just become part of an additional XX-
segment.
I've left the above info in because it solves a more general problem. For example, if each of the segments consisted of a letter and a number, you would have to take such an approach.
If you want it to match XX-XX-XX-XX-XX
but not XX-XX-XX-XX-XX-
, you can use
/^([A-Za-z0-9]{2}-){0,4}[A-Za-z0-9]{0,2}$/
A forked fiddle for that use case.
Looks like this does the trick:
/^([\w\d]{2}-)*([\w\d]|([\w\d]{2}-?)?)$/
See it in action here: http://jsfiddle.net/sadkinson/FaQe6/6/
Explanation:
/^([\w\d]{2}-)* -- any number of XX-
([\w\d] -- either a single X
|([\w\d]{2}-?)? -- or two Xs and maybe a dash to end
UPDATE: I fixed the above based on a very astute observation (+1) by a commenter :)
精彩评论