Validating email address with single character domain-names with a regex
I have a regex that I am using to validate email addresses. I like this regex because it is fairly relax and has proven to work quite well.
Here is the regex:
(['\"]{1,}.+['\"]{1,}\s+)?<?[\w\.\-]+@[^\.][\w\.\-]+\.[A-Za-z]{2,}>?
Ok great, basically all reasonably valid email addresses that you can throw at it will validate. I know that maybe even some invalid ones will fall through but that is ok for my specific use-case.
Now it happens to be the case that joe@x.com does not validate开发者_StackOverflow社区. And guess what x.com is actually a domain name that exists (owned by paypall).
Looking at the regex part that validates the domain name:
@[^\.][\w\.\-]+
It looks like this should be able to parse the x.com domain name, but it doesn't. The culprit is the part that checks that a domain name can not begin with a dot (such as test@.test.com)
@[^\.]
If I remove the [^.] part of my regex the domain x.com validates but now the regex allows domains names beginning with a dot, such as .test.com; this is a little bit too relax for me ;-)
So my question is how can the negative character list part affect my single character check, basically the way I am reading the regex is: "make sure this string does not start with a dot", but apparantly it does more.
Any help would be appreciated.
Regards,
Waseem
As Luis suggested, you can use [^\.][\w\.\-]*
to match the domtain name, however it will now also match addresses like john@x.....com
and john@@.com
. You might want to make sure that there is only one period at a time, and that the first character after the @ is more restricted than just not being a period.
Match the domain name and the period (and subdomains and their periods) using:
([\w\-]+\.)+
So your pattern would be:
(['\"]{1,}.+['\"]{1,}\s+)?<?[\w\.\-]+@([\w\-]+\.)+[A-Za-z]{2,}>?
If you change [^\.][\w\.\-]+
to [^\.][\w\.\-]*
, it will work as you expect!
The reason is: [^\.]
will match a single character which is not a dot (in your case, the "x" on "x.com", then you will try to reach 1 or more characters, and then a dot. You will match the dot after the x, and there are no more dots to match. The * will match 0 or more characters after the first one, which is what you want.
Change the quantifier +, meaning one or more, to *, meaning zero or more.
Change @[^\.][\w\.\-]+
to @[^\.][\w\.\-]*
The reason you need this is that [^\.]
says match a single character that is not a dot. Now there are no more characters left so the [\w\.\-]+
has nothing to match, even though the plus sign requires a minimum of one character. Changing the plus to a star fixes this.
Look at the broader context in your pattern:
@[^\.][\w\.\-]+\.[A-Za-z]{2,}
So for joe@x.com
,
[^.]
matchesx
[\w.-]+
matches.
\.
needs a dot but findsc
Change this part to @[^.][\w-]*\.[A-Za-z]{2,}
精彩评论