How are the new unicode domains going to be handled by email regexes?

2022-12-10 17:57 问答作者：

Since

In October 2009, the Internet Corporation for Assigned Names and Numbers (ICANN) approved the creation of country code top-level domains (ccTLDs) in the Internet that use the IDNA standard开发者_如何转开发 for native language scripts.

I'm pretty sure that the standard regexes most sites currently use won't mark these as valid, or am I wrong? Has anyone actually thought about how this would play out or has anyone done anything about it?

Hope I'm not jumping the gun here.

When a user types an internationalized domain into a browser, it's translated to an ASCII form; e-mail, surely, must work the same way (however, I've never received mail from an IDNA domain and I have reason to believe browsers are the only implementors of it).

Mailing agents would have to know that when they see Unicode in an address, it must be translated to IDNA form, and then the MX records looked up. I don't think in all of my system administration I've ever accounted for this. Being able to accept something the browser will translate as IDNA in a form element is not something I know how to do. If it is indeed translated to IDNA and a regex attempts to validate it, it should work.

I wouldn't be surprised if an international domain fails most e-mail regular expressions, and I think the relevance of such a fail is less than 1%. IDNA is really an "address bar" system, and an awful hack; I would really be surprised if e-mail worked on top of it.

Everyone is freaking out like something is changing. It isn't. IDNA is just moving from the domain to the TLD, and business will be as usual like it was before. Don't overthink it, OP.

Old regexes will mark IDNA names valid, provided they are correctly translated into ASCII DNS names.

So yes, we have a problem here. One cannot expect a user to simply input unicode into a textarea and receive an ASCII version of the domain name on the server side.

IDNA encoding is not nice, nor easy: Unicode chars are removed for the word they are in and placed after it, with a position marker.

Reimplementing it (e.g.) in javascript is slow, sad and boring. An url-encode-like approach would have made porting over every language easier.

Also people with systems not supporting IDNA have an hard time figuring out what a given domain looks like in ASCII by hand.

I feel IDNA came out pretty ugly, and that will hinder its adoption.

继续阅读：email internationalization regex unicode

How are the new unicode domains going to be handled by email regexes?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？