How are the new unicode domains going to be handled by email regexes?
Since
In October 2009, the Internet Corporation for Assigned Names and Numbers (ICANN) approved the creation of country code top-level domains (ccTLDs) in the Internet that use the IDNA standard开发者_如何转开发 for native language scripts.
I'm pretty sure that the standard regexes most sites currently use won't mark these as valid, or am I wrong? Has anyone actually thought about how this would play out or has anyone done anything about it?
Hope I'm not jumping the gun here.
When a user types an internationalized domain into a browser, it's translated to an ASCII form; e-mail, surely, must work the same way (however, I've never received mail from an IDNA domain and I have reason to believe browsers are the only implementors of it).
Mailing agents would have to know that when they see Unicode in an address, it must be translated to IDNA form, and then the MX records looked up. I don't think in all of my system administration I've ever accounted for this. Being able to accept something the browser will translate as IDNA in a form element is not something I know how to do. If it is indeed translated to IDNA and a regex attempts to validate it, it should work.
I wouldn't be surprised if an international domain fails most e-mail regular expressions, and I think the relevance of such a fail is less than 1%. IDNA is really an "address bar" system, and an awful hack; I would really be surprised if e-mail worked on top of it.
Everyone is freaking out like something is changing. It isn't. IDNA is just moving from the domain to the TLD, and business will be as usual like it was before. Don't overthink it, OP.
Old regexes will mark IDNA names valid, provided they are correctly translated into ASCII DNS names.
So yes, we have a problem here. One cannot expect a user to simply input unicode into a textarea and receive an ASCII version of the domain name on the server side.
IDNA encoding is not nice, nor easy: Unicode chars are removed for the word they are in and placed after it, with a position marker.
Reimplementing it (e.g.) in javascript is slow, sad and boring. An url-encode-like approach would have made porting over every language easier.
Also people with systems not supporting IDNA have an hard time figuring out what a given domain looks like in ASCII by hand.
I feel IDNA came out pretty ugly, and that will hinder its adoption.
精彩评论