C# performance of static string[] contains() (slooooow) vs. == operator
Just a quick query: I had a piece of code which compared a string against a long list of values, e.g.
if(str == "string1" || str == "string2" || str == "string3" || str == "string4".
DoSomething();
And the interest of code clarity and maintainability I changed it to
public static string[] strValues = { "String1", "String2", "String3", "String4"};
...
if(strValues.Contains(str)
DoSomethi开发者_StackOverflowng();
Only to find the code execution time went from 2.5secs to 6.8secs (executed ca. 200,000 times).
I certainly understand a slight performance trade off, but 300%? Anyway I could define the static strings differently to enhance performance? Cheers.Fyi..
Using:
private static HashSet<string> strHashSet = new HashSet<string>()
{ "0string", "1string", "2string", "3string", "4string", "5string",
"6string", "7string", "8string", "9string", "Astring", "Bstring" };
private static List<string> strList = strHashSet.ToList();
private static string[] strArray = strList.ToArray();
private static Dictionary<int, string> strHashDict = strHashSet.ToDictionary(h => h.GetHashCode());
private static Dictionary<string, string> strDict = strHashSet.ToDictionary(h => h);
// Only one test uses this method.
private static bool ExistsInList(string str)
{
return strHashDict.ContainsKey(str.GetHashCode());
}
Checking for the first and last strings in the list then checking for a string not in the list: "xstring" Executing 500,000 iterations, all times in milliseconds.
1.A Test: result = (str == "0string" || str == "1string" ...
[storage var] [first]:[ last ]:[ none ]:[average]
strArray 3.78 : 45.90 : 57.77 : 35.82
2.A Test: ExistsInList(string);
[storage var] [first]:[ last ]:[ none ]:[average]
none 36.14 : 28.97 : 24.02 : 29.71
3.A Test: .ContainsKey(string.GetHashCode());
[storage var] [first]:[ last ]:[ none ]:[average]
strHashDict 34.86 : 28.41 : 21.46 : 28.24
4.A Test: .ContainsKey(string);
[storage var] [first]:[ last ]:[ none ]:[average]
strDict 38.99 : 32.34 : 22.75 : 31.36
5.A Test: .Contains(string);
[storage var] [first]:[ last ]:[ none ]:[average]
strHashSet 39.54 : 34.78 : 24.17 : 32.83
strList 23.36 : 122.07 : 127.38 : 90.94
strArray 350.34 : 426.29 : 426.05 : 400.90
6.A Test: .Any(p => p == string);
[storage var] [first]:[ last ]:[ none ]:[average]
strHashSet 75.70 : 331.38 : 339.40 : 248.82
strList 72.51 : 305.00 : 319.29 : 232.26
strArray 38.49 : 213.63 : 227.13 : 159.75
Interesting (if not unexpected) results when we change the strings in the list:
private static HashSet<string> strHashSet = new HashSet<string>()
{ "string00", "string01", "string02", "string03", "string04", "string05",
"string06", "string07", "string08", "string09", "string10", "string11" };
With "string99" as the none check.
1.B Test: result = (str == "string00" || str == "string01" ...
[storage var] [first]:[ last ]:[ none ]:[average]
strArray 85.45 : 87.06 : 91.82 : 88.11
2.B Test: ExistsInList(string);
[storage var] [first]:[ last ]:[ none ]:[average]
none 30.12 : 27.97 : 21.36 : 26.48
3.B Test: .ContainsKey(string.GetHashCode());
[storage var] [first]:[ last ]:[ none ]:[average]
strHashDict 32.51 : 28.00 : 20.83 : 27.11
4.B Test: .ContainsKey(string);
[storage var] [first]:[ last ]:[ none ]:[average]
strDict 36.45 : 32.13 : 22.39 : 30.32
5.B Test: .Contains(string);
[storage var] [first]:[ last ]:[ none ]:[average]
strHashSet 37.29 : 34.33 : 23.56 : 31.73
strList 23.34 : 147.75 : 153.04 : 108.04
strArray 349.62 : 460.19 : 459.99 : 423.26
6.B Test: .Any(p => p == string);
[storage var] [first]:[ last ]:[ none ]:[average]
strHashSet 76.26 : 355.09 : 361.31 : 264.22
strList 70.20 : 332.33 : 341.79 : 248.11
strArray 37.23 : 234.70 : 251.81 : 174.58
For cases A and B looks like tests 2 and 3 have the advantage.
However, HashSet.Contains(string) is very efficient, not effected by list contents and has a clear syntax...might be the best choice.
Yes, it is true, I have no life.
Are you actually running this 200,000 times in production code? You may want to consider hash checks as a faster negative-check if so.
If the 200,000 times was just to illustrate the difference, then I wouldn't worry about it. It's only a 0.02 millisecond increase in time.
Contains
is more general-purpose than testing static strings, so there is a small amount of overhead. Unless this code is a bottleneck as Mark mentioned, it's not worth optimizing. There's a famous quote in CS: "premature optimization is the root of all evil." The quote isn't quite accurate, but it's a good reminder of the ultimate optimization guideline: measure first.
Both the methods you have tried have O(n) performance so they will get slower as you add more strings. If you are using .NET 3.5 or newer then you could try using a HashSet<string>
instead and initializing it once at the start of the application. You can then get approximately O(1) lookups.
For .NET v2.0 you can emulate a HashSet
using a Dictionary<string, object>
and using ContainsKey
and not using the value.
Here's an alternative you may arguably find readable and maintainable, that you may wish to test for speed. If you do test it for speed, please post your result!
switch (str)
{
case "String1":
case "String2":
case "String3":
case "String4":
DoSomething();
break;
}
Although using a HashSet<string>
as suggested could be a better option, the reason why strValues.Contains(str)
is slower, is because it's a generic extension method. There is no such thing as a Contains method on arrays.
The way it works for arrays is basically
if (strValues is ICollection<string>) // true
{
return ((ICollection<string>) strValues).Contains(str);
}
which adds a typecheck, typecast and virtual call. Then it will iterate the array (causing bounds checks). Only then will it come to do string comparisons. So it's doing a lot more work.
Note that in C# 3 (which you must be using if you're using extension methods), you can simply initialize the HashSet<string>
like this:
public static HashSet<string> strValues = new HashSet<string> {
"String1", "String2", "String3", "String4" };
This keeps your program just as readable as it is now using arrays.
You may find that Contains() works better for a longer list. It may, for example, sort the list and do a binary search (just a thought experiment, for an example.)
精彩评论