Profiling method performance
I'm trying to profile some code, but I'm running into some issues with what I think is Java's com开发者_Go百科piler being smart and changing the way the code works.
example methods:
public int method1(int bits)
{
// seed is a long object field
seed = seed * 0x5DEECE66DL + 0xBL & (1L << 48) - 1;
return (int) (seed >>> 48 - bits);
}
public int method2(int bits)
{
// seed is a long object field
seed *= 0x5DEECE66DL;
seed += 0xBL & (1L << 48) - 1;
return (int) (seed >>> 48 - bits);
}
public int method3(int bits)
{
// multiplier, seeds, n, and carry are long/long array object fields
final long t = multiplier * seeds[n] + carry;
// carry = t / b (done in an unsigned way)
final long div32 = t >>> 32; // div32 = t / (b+1)
carry = div32 + ((t & 0xFFFFFFFFL) >= 0xFFFFFFFFL - div32 ? 1L : 0L);
// seeds[n] = (b-1)-t%b (done in an unsigned way)
seeds[n] = 0xFFFFFFFEL - (t & 0xFFFFFFFFL) - (carry - div32 << 32) - carry & 0xFFFFFFFFL;
final long result = seeds[n];
n = n + 1 & r - 1;
return (int) (result >>> 32 - bits);
}
Now, to compare the performance, I'm using this test rig:
// gen is the object containing the methods/fields
int result;
long start = System.currentTimeMillis();
for (int i = 0; i < 0x7FFFFFFF; ++i)
{
result = gen.method1(32);
}
long end = System.currentTimeMillis();
System.out.println(end - start);
start = System.currentTimeMillis();
for (int i = 0; i < 0x7FFFFFFF; ++i)
{
result = gen.method2(32);
}
end = System.currentTimeMillis();
System.out.println(end - start);
start = System.currentTimeMillis();
for (int i = 0; i < 0x7FFFFFFF; ++i)
{
result = gen.method3(32);
}
end = System.currentTimeMillis();
System.out.println(end - start);
However, the results I'm getting back are rather odd:
7 3109 13402
method1
and method2
should both have roughly the same computation time as they compute the exact same thing. method3
I can understand taking longer, but looking at the amount of calculations, none of the 3 results seem long enough to me. This seems to flag to me that somehow Java's optimizing my test rig and not running it all 0x7FFFFFFF
times.
To fix this, I thought of declaring result
as a static volatile
field (outside of the test rig method), and doing that generated believable results:
21814 21468 26962
Now my question is is this the accepted way to profile methods? I would still like optimizations to occur in my methods I'm profiling, just not in the test rig (at least not optimizations which would result in the method being called less than the prescribed number of times or caching the results between runs).
You should be using a profiler like YourKit or similar on your application as a whole. Focus on the problem areas and not the random number generator that is unlikely a bottleneck of any sort. Do you really have a use case that consumes more than 80 million per second? Just about anything else is going to take more time than these methods. You're almost certainly worrying about the wrong thing for the big picture.
If you really want to or have a good reason to continue this approach at least read this first: How do I write a correct micro-benchmark in Java? There are all sorts of factors that affect a micro-benchmark and translating them to the real world rarely matters or works. A profiler will tell you what you really need to know and the rest isn't worth spending much time on.
I think this is the wrong approach. I don't think that the tests are meaningful predictors of what you'll experience in production if you try to circumvent the runtime optimizer. I'd recommend abandoning this avenue.
You should perform the test in different orders as the JVM can otimise the code in different ways as the code runs. i.e. the compiler doe snext to no optimisations so you can run code several times in the same program and its performance can change, usually it improves.
Note: for short loops, where you discard the result, the JVM can determine you don't do anything useful and discard the loop. Instead of determining the loops runtime, you are determining how long it takes the JVM to discard the loop.
If you use volatile, this can be more expensive than what you are testing, which could be one reason the results are much the same when using voltile. ;)
A simple way to avoid the JVM optimising the loop to nothing is to add the results and the print the total (which you can ignore)
精彩评论