开发者

Is something wrong with my regex?

I made an XML Schema and I have this in it.

<xs:element name="Email">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:pattern value="\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:element>

Some of my emails in one of my XML documents fail and I get this error

Email' element is invalid - The value 'Some_Name@hotmail.com' is invalid according to its datatype 'String' - The Pattern constraint failed. LineNumber: 15404 LinePostion: 32

So just looking at all the emails that passed and the ones that failed I noticed that all the ones that failed have an "_(underscore)". So I am unsure if this is the reason or not.

Edit

So I changed my regex to this

 <xs:pattern value="[\w_]+([-+.'][\w_]+)*@[\w_]+([-.][\w_]+)*\.[\w_]+([-.][\w_]+)*"/>

It now works but don't und开发者_C百科erstand why \w is not capturing it.


The W3C Recommendation on datatypes defines \w as:

[#X0000-#x10FFFF]-[\p{P}\p{Z}\p{C}] (all characters except the set of "punctuation", "separator" and "other" characters)*

The underscore character definition in Unicode is 'LOW LINE' (U+005F), category: punctuation, connector [Pc]

so XML Schema handles character classes more in accordance with Unicode definitions.

But for e-mail regexp, you shold use strict ASCII, like [0-9A-Za-z_-] intead of \w (I bet email address with nonlatin characters is invalid :) ), yet better is to find a proven regexp syntax, or look into RFC, what is the proper e-mail format


Something is weird because \w typically accepts underscores. Try to add _ to the \w that you would be expecting the _ in, by changing them to [\w_].


Could very well be, because your regex wont recognize an email w/ an underscore.

Check out this topic: Using a regular expression to validate an email address

It's one I have bookmarked for how useful it is.


Yes. You do not match the underscore character. Just try to add it...

\w+([-+.'_]\w+)*...


Something is in fact strange; since the \w character class includes underscores, as we can see with Rubular, the email you have should validate. Is it possible there's another problem—a stray space, for instance? However, the other problem with this is that there is no regular expression which correctly accepts all email addresses and nothing else; this Stack Overflow question has a good answer. There may be a better way to deal with validating email addresses than this schema/regex.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜