Replace a list of invalid character with their valid version (like tr)
I need to do something like this dreamed .trReplace
:
str = str.trReplace("áéíüñ","aeiu&");
It should change this string:
a stríng with inválid charactérs
to:
a string with invalid characters
My current ideas are:
str = str.Replace("á","a").Replace("é","e").Replace("í","ï"...
and:
sb = new St开发者_开发技巧ringBuilder(str)
sb.Replace("á","a").
sb.Replace("é","e")
sb.Replace("í","ï"...
But I don't think they are efficient for long strings.
Richard has a good answer, but performance may suffer slightly on longer strings (about 25% slower than straight string replace as shown in question). I felt complelled to look in to this a little further. There are actually several good related answers already on StackOverflow as captured below:
Fastest way to remove chars from string
C# Stripping / converting one or more characters
There is also a good article on the CodeProject covering the different options.
http://www.codeproject.com/KB/string/fastestcscaseinsstringrep.aspx
To explain why the function provided in Richards answer gets slower with longer strings is due to the fact that the replacements are happening one character at a time; thus if you have large sequences of non-mapped characters, you are wasting extra cycles while re-appending together the string . As such, if you want to take a few points from the CodePlex Article you end up with a slightly modified version of Richards answer that looks like:
private static readonly Char[] ReplacementChars = new[] { 'á', 'é', 'í', 'ü', 'ñ' };
private static readonly Dictionary<Char, Char> ReplacementMappings = new Dictionary<Char, Char>
{
{ 'á', 'a'},
{ 'é', 'e'},
{ 'í', 'i'},
{ 'ü', 'u'},
{ 'ñ', '&'}
};
private static string Translate(String source)
{
var startIndex = 0;
var currentIndex = 0;
var result = new StringBuilder(source.Length);
while ((currentIndex = source.IndexOfAny(ReplacementChars, startIndex)) != -1)
{
result.Append(source.Substring(startIndex, currentIndex - startIndex));
result.Append(ReplacementMappings[source[currentIndex]]);
startIndex = currentIndex + 1;
}
if (startIndex == 0)
return source;
result.Append(source.Substring(startIndex));
return result.ToString();
}
NOTE Not all edge cases have been tested.
NOTE Could replace ReplacementChars with ReplacementMappings.Keys.ToArray() for a slight cost.
Assuming that NOT every character is a replacement char, then this will actually run slightly faster than straigt string replacements (again about 20%).
That being said, remember when considering performance cost, what we are actually talking about... in this case... the difference between the optimized solution and original solution is about 1 second over 100,000 iterations on a 1,000 character string.
Either way, just wanted to add some information to the answers for this question.
I did something similar for ICAO Passports. The names had to be 'transliterated'. Basically I had a Dictionary of char to char mappings.
Dictionary<char, char> mappings;
static public string Translate(string s)
{
var t = new StringBuilder(s.Length);
foreach (char c in s)
{
char to;
if (mappings.TryGetValue(c, out to))
t.Append(to);
else
t.Append(c);
}
return t.ToString();
}
What you want is a way to go through the string once and do all the replacements. I am not not sure that regex is the best way to do it if you want efficiency. It could very well be that a case switch (for all the characters that you want to replace) in a for loop to test every character is faster. I would profile the two approaches.
It would be better to use an array of char instead of Stringbuilder. The indexer is faster than calling the Append method, because:
- push all local variables to the stack
- move to Append address
- return to address
- pop all local variables from the stack
The example below is about 20 percent faster (depends on your hardware and input string)
static Dictionary<char, char> mappings;
public static string TranslateV2(string s)
{
var len = s.Length;
var array = new char[len];
char c;
for (var index = 0; index < len; index++)
{
c = s[index];
if (mappings.ContainsKey(c))
array[index] = mappings[c];
else
array[index] = c;
}
return new string(array);
}
精彩评论