开发者

Best practices for sanitizing Unicode input

I'm working on a web application at the moment (using Ruby) that I would ultimately like to be usable by people from anywhere in the world. With that in mind, support for non-AS开发者_JAVA百科CII characters is essential. However, I don't want the database to be full of "noise" characters in fields such as username etc.

Are there any accepted best practices for dealing with Unicode input under these circumstances without alienating users? Any thoughts on dealing with homographs in usernames to make impersonation harder?

Some of my thoughts so far -

  • normalizing text before storing or using it in queries
  • filtering non-printable characters
  • limiting the number of sequential combining diacritics allowed in input

Any further thoughts, or am I making unnecessary work for myself?

Thanks.


http://www.ietf.org/rfc/rfc3454.txt will tell you what you should be doing, which is to say worrying about normalization and security issues.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜