Why is Any slower than Contains?
I designed the following test:
var arrayLength=5000;
object[] objArray=new object[arrayLength];
for(var x=0;x<arrayLength;x++)
{
objArray[x]=new obje开发者_JS百科ct();
}
objArray[4000]=null;
const int TestSize=int.MaxValue;
System.Diagnostics.Stopwatch v= new Stopwatch();
v.Start();
for(var x=0;x<10000;x++)
{
objArray.Contains(null);
}
v.Stop();
objArray.Contains(null).Dump();
v.Elapsed.ToString().Dump("Contains");
//Any ==
v.Reset();
v.Start();
for(var x=0;x<10000;x++)
{
objArray.Any(o=>o==null);
}
v.Stop();
objArray.Any(x=>x==null).Dump();
v.Elapsed.ToString().Dump("Any");
//Any Equals
v.Reset();
v.Start();
for(var x=0;x<10000;x++)
{
objArray.Any(obj=>object.Equals( obj,null));
}
v.Stop();
objArray.Any(obj=>object.Equals( obj,null)).Dump();
v.Elapsed.ToString().Dump("Any");
The results when null is not present:
Contains False 00:00:00.0606484
Any == False 00:00:00.7532898
Any object.Equals False 00:00:00.8431783
When null is present at element 4000:
Contains True 00:00:00.0494515
Any == True 00:00:00.5929247
Any object.Equals True 00:00:00.6700742
When null is present at element 10:
Contains True 00:00:00.0038035
Any == True 00:00:00.0025687
Any True 00:00:00.0033769
So when the object is near the front, Any
is slightly faster; when it's at the back, it's much much slower. Why?
Any
will have to call a delegate for every element it checks (an extra callvirt
instruction which is unlikely to get inlined by the JIT). Contains
only performs that check. That's why Any
is slower. I suspect the fact that Any
looks faster than contains when the element is seen very early is that the benchmark can't reflect it easily since they are very close. The setup time for the method call is the majority of the work done in that case (rather than the actual searching operation).
The anonymous method:
--- C:\Users\Mehrdad\AppData\Local\Temporary Projects\ConsoleApplication1\Program.cs
Console.WriteLine(s.Any(a => a == 1));
00000000 xor eax,eax
00000002 cmp ecx,1
00000005 sete al
00000008 ret
Relevant part of Enumerable.Any code:
...
00000051 mov edx,eax
00000053 mov rcx,qword ptr [rbx+8]
00000057 call qword ptr [rbx+18h] // calls the anonymous method above
0000005a movzx ecx,al
0000005d test ecx,ecx
...
Any is slower because Contains is tailored to the specific container you're using (Array/List/etc), so it doesn't have the overhead of firing up an IEnumerable, calling MoveNext() all the time, etc.
However, using Any will make refactoring easier since if you change that collection, you can stil use it, so really I'd only change it to Contains if you know via a profiler that this is an important piece of code. And if it is, you should probably end up using a smarter data structure anyways like a HashSet, since both Any and Contains are O(n).
As other people have already noted, both the Contains and Any methods are extension methods of Enumerable. The big difference in performance has a couple of reasons:
First of all, you supply a delegate to the Any, which has to be called for every object, while the Contains method doesn't have to. Delegate calls are about as fast as a call to an interface method. For this reason, Any is slower.
Next, something that other people seem to have missed, the Contains extension method has a performance optimization for collections implementing ICollection. Because object[] implements ICollection the extension method call results in a method call on the array itself. Internally, this array.Contains method uses a simple for loop to iterate over the array to compare the value. This means array bounds checking is done just once iterating the array.
Because the Any method must call your delegate, a performance optimization as with the Contains method is not possible. This means that the Any method iterates over the collection using the IEnumerable interface, which leads to a interface call + an array bounds check + a delegate call on each and every element. Compare that to the array.Contains, where there are no interface calls, no delegate calls and a single bounds check.
[Update]: One last note. The reason that Any is faster with small collections (and in your case with a null value at the start of the collection) has to do with the cast to ICollection that the Enumerable.Contains does When you do the cast to ICollection yourself, you'll see that the call to Contains is faster than Any:
for(var x=0;x<10000;x++)
{
ICollection<object> col = objArray;
col.Contains(null);
}
I'm guessing it has to do with the fact that Any is an extension method, part of the LINQ library and involves using delegates (via the Func<> syntax). Any time you have to call out to a seperate method (especially as a delegate) it will slow down.
精彩评论