开发者

Performance when Generating CPU Cache Misses

I am trying to learn about CPU cache performance in the world of .NET. Specifically I am working through Igor Ostovsky's article about Processor Cache Effects.

I have gone through the first three examples in his article and have recorded results that widely differ from his. I think I must be doing something wrong because the performance on my machine is showing almost the exact opposite results of what he shows in his article. I am not seeing the large effects from cache misses that I would expect.

What am I doing wrong? (bad code, compiler setting, etc.)

Here are the performance results on my machine:

Performance when Generating CPU Cache Misses

Performance when Generating CPU Cache Misses

Performance when Generating CPU Cache Misses

If it helps, the processor on my machine is an Intel Core i7-2630QM. Here is info on my processor's cache:

Performance when Generating CPU Cache Misses

I have compiled in x64 Release mode.

Below is my source code:

class Program
    {

        static Stopwatch watch = new Stopwatch();

        static int[] arr = new int[64 * 1024 * 1024];

        static void Main(string[] args)
        {
            Example1();
            Example2();
            Example3();


            Console.ReadLine();
        }

        static void Example1()
        {
            Console.WriteLine("Example 1:");

            // Loop 1
            watch.Restart();
            for (int i = 0; i < arr.Length; i++) arr[i] *= 3;
            watch.Stop();
            Console.WriteLine("     Loop 1: " + watch.ElapsedMilliseconds.ToString() + " ms");

            // Loop 2
            watch.Restart();
            for (int i = 0; i < arr.Length; i += 32) arr[i] *= 3;
            watch.Stop();
            Console.WriteLine("     Loop 2: " + watch.ElapsedMilliseconds.ToString() + " ms");

            Console.WriteLine();
        }

        static void Example2()
        {

            Console.WriteLine("Example 2:");

            for (int k = 1; k <= 1024; k *= 2)
            {

                watch.Restart();
                for (int i = 0; i < arr.Length; i += k) arr[i] *= 3;
                watch.Stop();
                Console.WriteLine("     K = "+ k + ": " + watch.ElapsedMilliseconds.ToString() + " ms");

            }
            Console.WriteLine();
        }

        static void Example开发者_StackOverflow社区3()
        {   

            Console.WriteLine("Example 3:");

            for (int k = 1; k <= 1024*1024; k *= 2)
            {

                //256* 4bytes per 32 bit int * k = k Kilobytes
                arr = new int[256*k];



                int steps = 64 * 1024 * 1024; // Arbitrary number of steps
                int lengthMod = arr.Length - 1;

                watch.Restart();
                for (int i = 0; i < steps; i++)
                {
                    arr[(i * 16) & lengthMod]++; // (x & lengthMod) is equal to (x % arr.Length)
                }

                watch.Stop();
                Console.WriteLine("     Array size = " + arr.Length * 4 + " bytes: " + (int)(watch.Elapsed.TotalMilliseconds * 1000000.0 / arr.Length) + " nanoseconds per element");

            }
            Console.WriteLine();
        }

    }


Why are you using i += 32 in the second loop. You are stepping over cache lines in this way. 32*4 = 128bytes way bigger then 64bytes needed.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜