Manipulating a String: Removing special characters - Change all accented letters to non accented
I'm using asp.net 4 and c#.
I have a string that can contains:
- Special Characters, like: !"£$%&/()/#
- Accented letters, like: àòèù
- Empty spaces, like: " "(1 consecutive or more),
Example string:
#Hi this is rèally/ special strìng!!!
I would like to:
a) Remove all Special Characters, like:
Hi this is rèally special strìng
b) Convert all Accented letters to NON Accented letters, like:
Hi this is really special string
c) Remove all Empty spaces and replace theme with a dash (-), like:
Hi-this-is-really-special-string
My aim is to creating a string suitable for URL path for better SEO.
Any idea how to do it with Regular Expression or another techniques?
Thanks for your help on this!
Similar to mathieu's answer, but more custom made for you requirements. This solution first strips special characters and diacritics from the input string, and then replaces whitespace with dashes:
string s = "#Hi this is rèally/ special strìng!!!";
string normalized = s.Normalize(NormalizationForm.FormD);
StringBuilder resultBuilder = new StringBuilder();
foreach (var character in normalized)
{
UnicodeCategory category = CharUnicodeInfo.GetUnicodeCategory(character);
if (category == UnicodeCategory.LowercaseLetter
|| category == UnicodeCategory.UppercaseLetter
|| category == UnicodeCategory.SpaceSeparator)
resultBuilder.Append(character);
}
string result = Regex.Replace(resultBuilder.ToString(), @"\s+", "-");
See it in action at ideone.com.
You should have a look a this answer : Ignoring accented letters in string comparison
Code here :
static string RemoveDiacritics(string sIn)
{
string sFormD = sIn.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
foreach (char ch in sFormD)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(ch);
}
}
return (sb.ToString().Normalize(NormalizationForm.FormC));
}
I am not an expert when it comes to RegularExpressions but I doubt it would be useful for this sort of computation.
To me, a simple iteration over the characters of the input is enough:
List<char> specialChars =
new List<char>() { '!', '"', '£', '$', '%', '&', '/', '(', ')', '/', '#' };
string specialString = "#Hi this is rèally/ special strìng!!!";
System.Text.StringBuilder builder =
new System.Text.StringBuilder(specialString.Length);
bool encounteredWhiteSpace = false;
foreach (char ch in specialString)
{
char val = ch;
if (specialChars.Contains(val))
continue;
switch (val)
{
case 'è':
val = 'e'; break;
case 'à':
val = 'a'; break;
case 'ò':
val = 'o'; break;
case 'ù':
case 'ü':
val = 'u'; break;
case 'ı':
case 'ì':
val = 'i'; break;
}
if (val == ' ' || val == '\t')
{
encounteredWhiteSpace = true;
continue;
}
if (encounteredWhiteSpace)
{
builder.Append('-');
encounteredWhiteSpace = false;
}
builder.Append(val);
}
string result = builder.ToString();
精彩评论