Is MulticastDelegate.CombineImpl inefficient?
I just fired up Reflector and peered into MultiCastDelegate.CombineImpl and saw some very lengthy code... every time two delegates are combined (read: every time you attached more than one event handler to an event), the following code gets run, which seems inefficient for such a fundamental performance critical feature. Does anyone know why it's been written this way?
[SecuritySafeCritical]
protected sealed override Delegate CombineImpl(Delegate follow)
{
object[] objArray;
int num2;
if (follow == null)
{
return this;
}
if (!Delegate.InternalEqualTypes(this, follow))
{
throw new ArgumentException(Environment.GetResourceString("Arg_DlgtTypeMis"));
}
MulticastDelegate o = (MulticastDelegate) follow;
int num = 1;
object[] objArray2 = o._invocationList as object[];
if (objArray2 != null)
{
num = (int) o._invocationCount;
}
object[] objArray3 = this._invocationList as object[];
if (objArray3 == null)
{
num2 = 1 + num;
objArray = new object[num2];
objArray[0] = this;
if (objArray2 == null)
{
objArray[1] = o;
}
else
{
for (int i = 0; i < num; i++)
{
objArray[1 + i] = objArray2[i];
}
}
return this.NewMulticastDelegate(objArray, num2);
}
int index = (int) this._invocationCount;
num2 = index + num;
objArray = null;
if (num2 <= objArray3.Length)
{
objArray = objArray3;
if (objArray2 == null)
{
if (!this.TrySetSlot(objArray, index, o))
{
objArray = null;
}
}
else
{
for (int j = 0; j < num; j++)
{
if (!this.TrySetSlot(objArray, index + j, objArray2[j]))
{
objArray = null;
break;
}
}
}
}
if (objArray == null)
{
int length = objArray3.Length;
while (length < num2)
{
length *= 2;
}
objArray = new object[length];
for (int k = 0; k < index; k++)
{
objArray[k] = objArray3[k];
}
if (objArray2 == null)
{
objArray[index] = o;
}
else
{
for (int m = 0; m < num; m++)
{
objArray[index + m] = objArray2[m];
}
}
}
return this.NewMulticastDelegate(objArray, num2, true);
}
Wouldn't a simple linked-list pattern have sufficed?
EDIT: My original question presupposes that the implementation is inefficient. With admirable tenacity, Hans and Jon (both well respected contributors here on SO), pointed out several facts about the suitability of the above implementation.
Hans points out that the use of arrays with TrySetSlot ends up being cache-friendly on multi-core CPUs (thus improving performa开发者_运维技巧nce), and Jon kindly put together a microbenchmark which reveals that the above implementation yields very acceptable performance characteristics.
It's the exact opposite, this code was optimized by not using List<> or similar collection object. Everything that List does is inlined here. The added advantage is that the locking is cheap (TrySetSlot uses Interlocked.CompareExchange) and saving the cost of dragging around a locking object. Explicitly inlining code like this rather than leaving it up to the JIT compiler isn't that common in the .NET framework. But exceptions are made for low-level primitives like this.
which seems extremely inefficient for such a fundamental performance critical feature
Just how often do you think delegates are attached to events (or combined at other times)?
For example, in a Windows Forms app this is likely to happen pretty rarely - basically when setting up a form, for the most part... at which point there are far heavier things going on than what's in MulticastDelegate.CombineImpl
.
What does happen very often is that delegates are invoked... for example, for every item in every projection or predicate (etc) in a LINQ query. That's the really performance critical bit, IMO.
I'm also not convinced that this code is as inefficient as you think it is. It's taking the same approach as ArrayList
in terms of creating a larger array than needed, to fill it as it's required. Would a linked list be more efficient? Possibly in some terms - but equally it would be less efficient in terms of locality and levels of indirection. (As each node would need to be a new object which itself contained a reference to the delegate, so navigating the list could end up bringing more pages into memory than an array of references would.)
EDIT: Just as a quick microbenchmark (with all the usual caveats) here's some code to perform a given number of iterations of combining a given number of delegates:
using System;
using System.Diagnostics;
class Test
{
const int Iterations = 10000000;
const int Combinations = 3;
static void Main()
{
// Make sure all paths are JITted
Stopwatch sw = Stopwatch.StartNew();
sw.Stop();
Action tmp = null;
for (int j = 0; j < Combinations; j++)
{
tmp += Foo;
}
sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
Action action = null;
for (int j = 0; j < Combinations; j++)
{
action += Foo;
}
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
static void Foo()
{
}
}
Some results on my machine, all with 10,000,000 iterations:
5 delegates: about 5.8 seconds
4 delegates: about 4.3 seconds
3 delegates: about 3.2 seconds
2 delegates: about 1.4 seconds
1 delegate: about 160ms
(All tests run multiple times; the above are just samples which seemed reasonably representative. I haven't taken the average or anything.)
Given the above results, I suspect that any paths even in combination-heavy WPF which only attach a single delegate will be blazingly fast. They'll slow down significantly going from 1-2, then gradually degrade (but with a lot less proportional difference than the 1-2 case).
精彩评论