Sum of products of two arrays (dotproduct)
First off, I know my title can be formulated better, but my math classes are so far gone I can't remember the correct words anymore..
I need to do something like this (pseudo c#)
int[] digits1 = new int[10]{0,1,2,3,4,5,6,7,8,9};
int[] digits2 = new int[10]{0,1,2,3,4,5,6,7,8,9};
int result = digits1*digits2
This would be the sum of the product of element[开发者_开发技巧i] of each array.
This obviously doesn't work. Any suggestions towards either a better title or the solution?
EDIT clarification: I know I could loop them both and do the math. Basically I would think there is a better way to do this and I'm looking for it purely out of personal curiousity.
With LINQ:
int dotProduct = digits1.Zip(digits2, (d1, d2) => d1 * d2)
.Sum();
Zip
will produce a streaming sequence containing the products of corresponding elements from both arrays, which is then summed into an integer with Sum
.
Note that this will not fail like it should when the arrays of unequal length, so you probably need to validate the input:
//null checks here
if(digits1.Length != digits2.Length)
throw new ArgumentException("...");
EDIT:
As Jeff M points out,Enumerable.Zip
was only added to the framework in .NET 4.0. In .NET 3.5, you can do this (the idea is only efficient for collections that expose fast indexers):
int dotProduct = Enumerable.Range(0, digits1.Length)
.Sum(i => digits1[i] * digits2[i]);
//from Jeff M's comment:
int dotProduct = digits1.Select((n, i) => n * digits2[i])
.Sum();
Solutions with LINQ
int[] digits1 = new int[10]{0,1,2,3,4,5,6,7,8,9};
int[] digits2 = new int[10]{0,1,2,3,4,5,6,7,8,9};
int result1 = digits1.Zip(digits2, (x, y) => x * y).Sum();
int result2 = digits1.Select((x, y) => x * digits2.ElementAt(y)).Sum();
int result3 = digits1.Select((n, i) => n * digits2[i]).Sum();
// Ani answer
int result4 = Enumerable.Range(0, digits1.Length)
.Sum(i => digits1[i] * digits2[i]);
Performance test 100000
iterations:
Queries
Fn: Result 1 Ticks 135306
Fn: Result 2 Ticks 2470614
Fn: Result 3 Ticks 130034
Fn: Result 4 Ticks 123374
-------------
Fastest
Fn: Result 4 Ticks 123374
Fn: Result 3 Ticks 130034
Fn: Result 1 Ticks 135306
Fn: Result 2 Ticks 2470614
I wrote a test bench to compare the these methods' times on my machine.
Specs:
Windows 7 Professional 64-bit
Intel Core 2 Quad Q9550 @ 2.83GHz
4x1GiB Corsair Dominator DDR2 1066 (PC2-8500)
using System;
using System.Linq;
namespace Testbench
{
class Program
{
static void Main(string[] args)
{
var digits1 = Enumerable.Range(0, 500).ToArray();
var digits2 = digits1.ToArray(); // create a copy
Test("Regular Loop", () =>
{
int result = 0;
for (int i = 0; i < digits1.Length; i++)
{
result += digits1[i] * digits2[i];
}
return result;
});
// using LINQ
Test("Enumerable \"Loop\"", () => Enumerable.Range(0, digits1.Length).Sum(i => digits1[i] * digits2[i]));
Test("Using Zip", () => digits1.Zip(digits2, (x, y) => x * y).Sum());
Test("Using Indexed Select", () => digits1.Select((n, i) => n * digits2[i]).Sum());
Test("Using Indexed Select with ElementAt", () => digits1.Select((n, i) => n * digits2.ElementAt(i)).Sum());
// using PLINQ
Test("Parallel Enumerable \"Loop\"", () => ParallelEnumerable.Range(0, digits1.Length).Sum(i => digits1[i] * digits2[i]));
Test("Using Parallel Zip", () => digits1.AsParallel().Zip(digits2.AsParallel(), (x, y) => x * y).Sum());
Test("Using Parallel Indexed Select", () => digits1.AsParallel().Select((n, i) => n * digits2[i]).Sum());
Test("Using Parallel Indexed Select with ElementAt", () => digits1.AsParallel().Select((n, i) => n * digits2.ElementAt(i)).Sum());
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
Console.WriteLine();
}
static void Test<T>(string testName, Func<T> test, int iterations = 1000000)
{
Console.WriteLine(testName);
Console.WriteLine("Iterations: {0}", iterations);
var results = Enumerable.Repeat(0, iterations).Select(i => new System.Diagnostics.Stopwatch()).ToList();
var timer = System.Diagnostics.Stopwatch.StartNew();
for (int i = 0; i < results.Count; i++)
{
results[i].Start();
test();
results[i].Stop();
}
timer.Stop();
Console.WriteLine("Time(ms): {0,3}/{1,10}/{2,8} ({3,10})", results.Min(t => t.ElapsedMilliseconds), results.Average(t => t.ElapsedMilliseconds), results.Max(t => t.ElapsedMilliseconds), timer.ElapsedMilliseconds);
Console.WriteLine("Ticks: {0,3}/{1,10}/{2,8} ({3,10})", results.Min(t => t.ElapsedTicks), results.Average(t => t.ElapsedTicks), results.Max(t => t.ElapsedTicks), timer.ElapsedTicks);
Console.WriteLine();
}
}
}
32-bit target:
Regular Loop Iterations: 1000000 Time(ms): 0/ 0/ 0 ( 1172) Ticks: 3/ 3.101365/ 526 ( 3244251) Enumerable "Loop" Iterations: 1000000 Time(ms): 0/ 4.3E-05/ 25 ( 9054) Ticks: 24/ 24.93989/ 69441 ( 25052172) Using Zip Iterations: 1000000 Time(ms): 0/ 2.4E-05/ 16 ( 16282) Ticks: 41/ 44.941406/ 45395 ( 45052491) Using Indexed Select Iterations: 1000000 Time(ms): 0/ 5.3E-05/ 32 ( 13473) Ticks: 34/ 37.165088/ 89602 ( 37280177) Using Indexed Select with ElementAt Iterations: 1000000 Time(ms): 0/ 1.5E-05/ 6 ( 160215) Ticks: 405/443.154147/ 17821 ( 443306156) Parallel Enumerable "Loop" Iterations: 1000000 Time(ms): 0/ 0.000103/ 29 ( 17194) Ticks: 38/ 47.412312/ 81906 ( 47576133) Using Parallel Zip Iterations: 1000000 Time(ms): 0/ 9.4E-05/ 19 ( 21703) Ticks: 49/ 59.859005/ 53200 ( 60051081) Using Parallel Indexed Select Iterations: 1000000 Time(ms): 0/ 0.000114/ 27 ( 20579) Ticks: 45/ 56.758491/ 75455 ( 56943627) Using Parallel Indexed Select with ElementAt Iterations: 1000000 Time(ms): 0/ 8.1E-05/ 19 ( 61137) Ticks: 144/ 168.97909/ 53320 ( 169165086)
64-bit target:
Regular Loop Iterations: 1000000 Time(ms): 0/ 0/ 0 ( 506) Ticks: 1/ 1.254137/ 1491 ( 1401969) Enumerable "Loop" Iterations: 1000000 Time(ms): 0/ 2.9E-05/ 15 ( 10118) Ticks: 27/ 27.850086/ 41954 ( 27995994) Using Zip Iterations: 1000000 Time(ms): 0/ 2.2E-05/ 13 ( 17089) Ticks: 45/ 47.132834/ 38506 ( 47284608) Using Indexed Select Iterations: 1000000 Time(ms): 0/ 3.1E-05/ 12 ( 14057) Ticks: 37/ 38.740923/ 33846 ( 38897274) Using Indexed Select with ElementAt Iterations: 1000000 Time(ms): 0/ 3.8E-05/ 29 ( 117412) Ticks: 304/324.711279/ 82726 ( 324872753) Parallel Enumerable "Loop" Iterations: 1000000 Time(ms): 0/ 9.9E-05/ 28 ( 24234) Ticks: 38/ 66.79389/ 77578 ( 67054956) Using Parallel Zip Iterations: 1000000 Time(ms): 0/ 0.000111/ 24 ( 30193) Ticks: 46/ 83.264037/ 69029 ( 83542711) Using Parallel Indexed Select Iterations: 1000000 Time(ms): 0/ 6.5E-05/ 20 ( 28417) Ticks: 45/ 78.337831/ 56354 ( 78628396) Using Parallel Indexed Select with ElementAt Iterations: 1000000 Time(ms): 0/ 9.2E-05/ 16 ( 65233) Ticks: 112/180.154663/ 44799 ( 180496754)
Very simply, do a loop.
int sum = 0;
for(int i = 0; i < digits1.length && i < digits2.length; i++)
{
sum += digits1[i] * digits2[i];
}
Boom.
int result = 0;
for(int i = 0; i < digits1.length; i++)
{
result += digits1[i] * digits2[i];
}
Even faster is to unroll the loop
Test("Regular Loop", () =>
{
int result = 0;
for (int i = 0; i < digits1.Length; i++)
{
result += digits1[i] * digits2[i];
}
return result;
});
// This will fail if vectors are not a multiple of 4 in length.
Test("Unroll 4x", () =>
{
int result = 0;
for (int i = 0; i < digits1.Length; i+=4)
{
result += digits1[i] * digits2[i];
result += digits1[i+1] * digits2[i+1];
result += digits1[i+2] * digits2[i+2];
result += digits1[i+3] * digits2[i+3];
}
return result;
});
Test("Dynamic unroll", () =>
{
int result = 0;
int limit = (digits1.Length/8)*8;
int reminderLimit = digits1.Length;
if (digits1.Length >= 8)
{
for (int i = 0; i < limit; i+=8)
{
result += digits1[i] * digits2[i];
result += digits1[i+1] * digits2[i+1];
result += digits1[i+2] * digits2[i+2];
result += digits1[i+3] * digits2[i+3];
result += digits1[i+4] * digits2[i+4];
result += digits1[i+5] * digits2[i+5];
result += digits1[i+6] * digits2[i+6];
result += digits1[i+7] * digits2[i+7];
}
reminderLimit = digits1.Length % 8;
}
switch(reminderLimit)
{
case 7: {
result += digits1[limit] * digits2[limit];
result += digits1[limit+1] * digits2[limit+1];
result += digits1[limit+2] * digits2[limit+2];
result += digits1[limit+3] * digits2[limit+3];
result += digits1[limit+4] * digits2[limit+4];
result += digits1[limit+5] * digits2[limit+5];
result += digits1[limit+6] * digits2[limit+6];
break;
}
case 6: {
result += digits1[limit] * digits2[limit];
result += digits1[limit+1] * digits2[limit+1];
result += digits1[limit+2] * digits2[limit+2];
result += digits1[limit+3] * digits2[limit+3];
result += digits1[limit+4] * digits2[limit+4];
result += digits1[limit+5] * digits2[limit+5];
break;
}
case 5: {
result += digits1[limit] * digits2[limit];
result += digits1[limit+1] * digits2[limit+1];
result += digits1[limit+2] * digits2[limit+2];
result += digits1[limit+3] * digits2[limit+3];
result += digits1[limit+4] * digits2[limit+4];
break;
}
case 4: {
result += digits1[limit] * digits2[limit];
result += digits1[limit+1] * digits2[limit+1];
result += digits1[limit+2] * digits2[limit+2];
result += digits1[limit+3] * digits2[limit+3];
break;
}
case 3: {
result += digits1[limit] * digits2[limit];
result += digits1[limit+1] * digits2[limit+1];
result += digits1[limit+2] * digits2[limit+2];
break;
}
case 2: {
result += digits1[limit] * digits2[limit];
result += digits1[limit+1] * digits2[limit+1];
break;
}
case 1: {
result += digits1[limit] * digits2[limit];
break;
}
default :
{
break;
}
}
return result;
});
There is a huge difference is running time between debug and release mode of C# code, this is run with release mode:
Regular Loop
Iterations: 1000000
Time(ms): 0/ 0/ 0 ( 596)
Ticks: 1/ 2,071213/ 455 ( 2154248)
Unroll 4x
Iterations: 1000000
Time(ms): 0/ 2E-06/ 1 ( 575)
Ticks: 1/ 1,984301/ 3876 ( 2076105)
Dynamic unroll
Iterations: 1000000
Time(ms): 0/ 0/ 0 ( 430)
Ticks: 1/ 1,4635/ 3228 ( 1554830)
Debug mode:
Regular Loop
Iterations: 1000000
Time(ms): 0/ 1E-06/ 1 ( 1296)
Ticks: 4/ 4,529916/ 3907 ( 4678354)
Unroll 4x
Iterations: 1000000
Time(ms): 0/ 0/ 0 ( 871)
Ticks: 2/ 3,048466/ 701 ( 3145277)
Dynamic unroll
Iterations: 1000000
Time(ms): 0/ 0/ 0 ( 819)
Ticks: 2/ 2,858588/ 1398 ( 2957179)
精彩评论