开发者

Manipulating a String: Removing special characters - Change all accented letters to non accented

I'm using asp.net 4 and c#.

I have a string that can contains:

  • Special Characters, like: !"£$%&/()/#
  • Accented letters, like: àòèù
  • Empty spaces, like: " "(1 consecutive or more),
开发者_如何转开发

Example string:

#Hi this          is  rèally/ special strìng!!!

I would like to:

a) Remove all Special Characters, like:

Hi this          is  rèally special strìng

b) Convert all Accented letters to NON Accented letters, like:

Hi this          is  really special string

c) Remove all Empty spaces and replace theme with a dash (-), like:

Hi-this-is-really-special-string

My aim is to creating a string suitable for URL path for better SEO.

Any idea how to do it with Regular Expression or another techniques?

Thanks for your help on this!


Similar to mathieu's answer, but more custom made for you requirements. This solution first strips special characters and diacritics from the input string, and then replaces whitespace with dashes:

string s = "#Hi this          is  rèally/ special strìng!!!";
string normalized = s.Normalize(NormalizationForm.FormD);


StringBuilder resultBuilder = new StringBuilder();
foreach (var character in normalized)
{
    UnicodeCategory category = CharUnicodeInfo.GetUnicodeCategory(character);
    if (category == UnicodeCategory.LowercaseLetter
        || category == UnicodeCategory.UppercaseLetter
        || category == UnicodeCategory.SpaceSeparator)
        resultBuilder.Append(character);
}
string result = Regex.Replace(resultBuilder.ToString(), @"\s+", "-");

See it in action at ideone.com.


You should have a look a this answer : Ignoring accented letters in string comparison

Code here :

static string RemoveDiacritics(string sIn)
{
  string sFormD = sIn.Normalize(NormalizationForm.FormD);
  StringBuilder sb = new StringBuilder();

  foreach (char ch in sFormD)
  {
    UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
    if (uc != UnicodeCategory.NonSpacingMark)
    {
      sb.Append(ch);
    }
  }

  return (sb.ToString().Normalize(NormalizationForm.FormC));
}


I am not an expert when it comes to RegularExpressions but I doubt it would be useful for this sort of computation.

To me, a simple iteration over the characters of the input is enough:

List<char> specialChars = 
    new List<char>() { '!', '"', '£', '$', '%', '&', '/', '(', ')', '/', '#' };

string specialString = "#Hi this          is  rèally/ special strìng!!!";

System.Text.StringBuilder builder =
    new System.Text.StringBuilder(specialString.Length);

bool encounteredWhiteSpace = false;


foreach (char ch in specialString)
{
    char val = ch;

    if (specialChars.Contains(val))
        continue;

    switch (val)
    {
        case 'è':
            val = 'e'; break;
        case 'à':
            val = 'a'; break;
        case 'ò':
            val = 'o'; break;
        case 'ù':
        case 'ü':
            val = 'u'; break;
        case 'ı':
        case 'ì':
            val = 'i'; break;
    }

    if (val == ' ' || val == '\t')
    {
        encounteredWhiteSpace = true;
        continue;
    }

    if (encounteredWhiteSpace)
    {
        builder.Append('-');
        encounteredWhiteSpace = false;
    }

    builder.Append(val);
}

string result = builder.ToString();
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜