开发者

Are multiple calls to string.Replace() less efficient than a single call to a Regex method in .NET?

I want to replace about 8 charac开发者_JS百科ters in a string.

Would it be more efficient to use a Regex method or just use multiple calls to string.Replace()

I'm replacing about 7 characters that may appear, all to be underscores instead. The characters could appear anywhere in the string, and not in a particular order etc.


Don't use the Regex class unless you actually need to match a regular expression. It's much more efficient to let the String type do straight text or character matching if that's all your doing than to create a Regex.

The Regex class is much more powerful than simple character or string matching. This power does not come for free. Using a full regex to match a character / string is overkill. It's the equivalent of using a high power explosive to remove a single ant from your lawn when your shoe would do just fine.


You don't need to make multiple calls to string.Replace() - it replaces all occurences in a single pass. See the MSDN documentation. You only need to make multiple calls if you are replacing different input sequences within the string (which may be what you're implying in your question).

In that case, I would use string.Split and string.Join for this:

var replaced = string.Join( "_", input.Split( new[]{'x','y','z'} ) );

This will split the string at every location where one of the characters 'x', 'y', or 'z' occurs (replace with your set) and will rejoin the fragments using the '_' character -- effectively replacing the originals. This approach is NOT necessarily more efficient than multiple calls to string.Replace - that would depend on the length of the input string, the number of occurences of the characters to replace, and so on. You would need to do profiling with real-world data to determine what is faster. What this approach does do, however, is make the code more concise.

As far as performance is concerned - I would go with the simplest and most readable solution first, and if testing demonstrates a problem - then I would profile and decide what alternative solutions to pursue (if any). Unless there are strong reasons to do otherwise, my personal priorities when writing code are:

  1. Make it correct.
  2. Make it clear.
  3. Make it concise.
  4. Make it fast.
  5. .. In that order.


At the same time, depending on how long these strings may be and how many characters may need to be replaced, you should use a StringBuilder object rather than a string.

Strings are immutable, so in each replacement you'd be creating a new string with the underscore. The StringBuilder class is more efficient for multiple changes to a string object.


12 ticks vs 1200 ticks vs 40 ticks

public class Program
{
    static void Main(string[] args)
    {
        string str = "abcdefghijklmnopqrstuvxywz0123456789_";
        string replace = "aez01234567";

        DoReplace(str, replace); // 12
        DoRegex(str, replace);   // 1200
        DoJoin(str, replace);    // 40

        Console.ReadKey();
    }

    public static void DoReplace(string str, string replace)
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < replace.Length; ++i)
        {
            str = str.Replace(replace[i], '*');
        }
        sw.Stop();
        Console.WriteLine("Multiple replace (" + sw.ElapsedTicks + ") => " + str);
    }

    public static void DoRegex(string str, string replace)
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        str = Regex.Replace(str, "[" + replace + "]", "*");
        sw.Stop();
        Console.WriteLine("Regex replace (" + sw.ElapsedTicks + ") => " + str);
    }

    public static void DoJoin(string str, string replace)
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        str = string.Join("*", str.Split(replace.ToCharArray()));
        sw.Stop();
        Console.WriteLine("Join replace (" + sw.ElapsedTicks + ") => " + str);
    }
}

I guess the simple replace in a loop is faster...

UPDATE Included LBushkin's method


without actually profiling it, i'd say using the regex is faster, since the string is only looped once. i guess each string.replace loops the whole string once

yourString = Regex.Replace(yourString, @"[uiae]", @"_"); // replace u, i, a and e with an underscore


Expanding on LoneDeveloper's answer, you could write your own extension method to make a bunch of replacements at once:

public static ReplaceAll(this string text, IEnumerable<string> needles, string replacement)
{
    var sb = new StringBuilder(text);
    foreach (string needle in needles)
    {
        sb.Replace(needle, replacement);
    }

    return sb.ToString();
}

To call this code, you'd just do something like this:

var needles = new string[] { "a", "b", "c" };

string abcRemoved = "abcdefg".ReplaceAll(needles, "_");
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜