开发者

Replace a list of invalid character with their valid version (like tr)

I need to do something like this dreamed .trReplace:

  str = str.trReplace("áéíüñ","aeiu&");

It should change this string:

  a stríng with inválid charactérs

to:

  a string with invalid characters

My current ideas are:

 str = str.Replace("á","a").Replace("é","e").Replace("í","ï"...

and:

 sb = new St开发者_开发技巧ringBuilder(str)
 sb.Replace("á","a").
 sb.Replace("é","e")
 sb.Replace("í","ï"...

But I don't think they are efficient for long strings.


Richard has a good answer, but performance may suffer slightly on longer strings (about 25% slower than straight string replace as shown in question). I felt complelled to look in to this a little further. There are actually several good related answers already on StackOverflow as captured below:

Fastest way to remove chars from string

C# Stripping / converting one or more characters

There is also a good article on the CodeProject covering the different options.

http://www.codeproject.com/KB/string/fastestcscaseinsstringrep.aspx

To explain why the function provided in Richards answer gets slower with longer strings is due to the fact that the replacements are happening one character at a time; thus if you have large sequences of non-mapped characters, you are wasting extra cycles while re-appending together the string . As such, if you want to take a few points from the CodePlex Article you end up with a slightly modified version of Richards answer that looks like:

private static readonly Char[] ReplacementChars = new[] { 'á', 'é', 'í', 'ü', 'ñ' };
private static readonly Dictionary<Char, Char> ReplacementMappings = new Dictionary<Char, Char>
                                                               {
                                                                 { 'á', 'a'},
                                                                 { 'é', 'e'},
                                                                 { 'í', 'i'},
                                                                 { 'ü', 'u'},
                                                                 { 'ñ', '&'}
                                                               };

private static string Translate(String source)
{
  var startIndex = 0;
  var currentIndex = 0;
  var result = new StringBuilder(source.Length);

  while ((currentIndex = source.IndexOfAny(ReplacementChars, startIndex)) != -1)
  {
    result.Append(source.Substring(startIndex, currentIndex - startIndex));
    result.Append(ReplacementMappings[source[currentIndex]]);

    startIndex = currentIndex + 1;
  }

  if (startIndex == 0)
    return source;

  result.Append(source.Substring(startIndex));

  return result.ToString();
}

NOTE Not all edge cases have been tested.

NOTE Could replace ReplacementChars with ReplacementMappings.Keys.ToArray() for a slight cost.

Assuming that NOT every character is a replacement char, then this will actually run slightly faster than straigt string replacements (again about 20%).

That being said, remember when considering performance cost, what we are actually talking about... in this case... the difference between the optimized solution and original solution is about 1 second over 100,000 iterations on a 1,000 character string.

Either way, just wanted to add some information to the answers for this question.


I did something similar for ICAO Passports. The names had to be 'transliterated'. Basically I had a Dictionary of char to char mappings.

Dictionary<char, char> mappings;

static public string Translate(string s)
{
   var t = new StringBuilder(s.Length);
   foreach (char c in s)
   {
      char to;
      if (mappings.TryGetValue(c, out to))
         t.Append(to);
      else
         t.Append(c);
    }
    return t.ToString();
 }


What you want is a way to go through the string once and do all the replacements. I am not not sure that regex is the best way to do it if you want efficiency. It could very well be that a case switch (for all the characters that you want to replace) in a for loop to test every character is faster. I would profile the two approaches.


It would be better to use an array of char instead of Stringbuilder. The indexer is faster than calling the Append method, because:

  • push all local variables to the stack
  • move to Append address
  • return to address
  • pop all local variables from the stack

The example below is about 20 percent faster (depends on your hardware and input string)

static Dictionary<char, char> mappings;
public static string TranslateV2(string s)
{
    var len = s.Length;
    var array = new char[len];
    char c;

    for (var index = 0; index < len; index++)
    {
        c = s[index];
        if (mappings.ContainsKey(c))
            array[index] = mappings[c];
        else
            array[index] = c;
    }

    return new string(array);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜