Appropriate character encoding / collation to store URLs?
My web application stores URL segments in a database. These URL segments are based on user-submitted content.
What collation should I use for character strings that will appear in URLs?
My assumption is ASCII General CI开发者_运维百科 (?) based on this question: Which characters make a URL invalid?
It doesn't really matter as far as I can see. The characters valid in a URL are represented in any character set I know of, and I wouldn't use different collations between tables and columns - you'll get "illegal mix of collations" problems on any attempt to join them or perform any other kind of cross-column or cross-table operation (see my recent problem here).
Correct me if I'm wrong of course.
I would argue Case Sensitivity matters, since you don't want duplicate content from the URLs /home and /Home.
These are 2 seperate pages, a mysql query in a _ci collation (select * from page where url='/Home'
) would return the page regardless of case.
精彩评论