How do you remove invalid characters when creating a friendly url (ie how do you create a slug)?
Say I have this webpage:
http://ww.xyz.com/Product.aspx?CategoryId=1If the name of CategoryId=1 is "Dogs" I would like to convert the URL into something like this:
http://ww.xyz.com/Products/DogsThe problem is if the category name contains foreign (or invalid for a url) characters. If the name of CategoryId=2 is "Göra äldre", what should be the new url?
Logically it should be:
http://ww.xyz.com/Products/Göra äldre but it will not work. Firstly because of the space (which I can easily replace by a dash for example) but what about the foreign characters? In Asp.net I could use the URLEncode func开发者_如何学编程tion which would give something like this: http://ww.xyz.com/Products/G%c3%b6ra+%c3%a4ldre but I can't really say it's better than the original url (http://ww.xyz.com/Product.aspx?CategoryId=2)Ideally I would like to generate this one but how can I can do this automatically (ie converting foreign characters to 'safe' url characters):
http://ww.xyz.com/Products/Gora-aldreI've come up with the 2 following extension methods (asp.net / C#):
public static string RemoveAccent(this string txt)
{
byte[] bytes = System.Text.Encoding.GetEncoding("Cyrillic").GetBytes(txt);
return System.Text.Encoding.ASCII.GetString(bytes);
}
public static string Slugify(this string phrase)
{
string str = phrase.RemoveAccent().ToLower();
str = System.Text.RegularExpressions.Regex.Replace(str, @"[^a-z0-9\s-]", ""); // Remove all non valid chars
str = System.Text.RegularExpressions.Regex.Replace(str, @"\s+", " ").Trim(); // convert multiple spaces into one space
str = System.Text.RegularExpressions.Regex.Replace(str, @"\s", "-"); // //Replace spaces by dashes
return str;
}
Transliterate non-ASCII characters to ASCII, using something like this:
var str = "éåäöíØ";
var noApostrophes = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(str));
=> "eaaoiO"
(Source)
One other thing worth considering:
If a user provides a string such as 好听的音乐
which you want to convert to a URL friendly title then you should consider using IdnMapping
For example:
string urlFriendlyTitle = Slugify(url);
public static string Slugify(string text)
{
IdnMapping idnMapping = new IdnMapping();
text = idnMapping.GetAscii(text);
text = RemoveAccent(text).ToLower();
// Remove all invalid characters.
text = Regex.Replace(text, @"[^a-z0-9\s-]", "");
// Convert multiple spaces into one space
text = Regex.Replace(text, @"\s+", " ").Trim();
// Replace spaces by underscores.
text = Regex.Replace(text, @"\s", "_");
return text;
}
public static string RemoveAccent(string text)
{
byte[] bytes = Encoding.GetEncoding("Cyrillic").GetBytes(text);
return Encoding.ASCII.GetString(bytes);
}
Without this, 好听的音乐
will be converted to string.Empty
. With this, xn--fjqr6lw2ek78az68a
which is punycode
I use the function described at http://www.blackbeltcoder.com/Articles/strings/converting-text-to-a-url-friendly-slug. It doesn't directly support non-English characters, but could be easily updated to support additional characters.
I like it because it produces a very clean-looking slug.
精彩评论