Performance of math functions?
I'm working with graphing accelerometer data here, and I'm trying to correct for gravity. To do this, I get the acceleration vector in spherical coordinates, decrease the radius by 1g, and convert back to cartesian. This method is called on a timer every 0.03 seconds:
//poll accleration
ThreeAxisAcceleration current = self.accelerationData;
//math to correct for gravity:
float radius = sqrt(pow(current.x, 2) + pow(current.y, 2) + pow(current.z, 2));
float theta = atan2(current.y, current.x);
float phi = acos(current.z/radius);
//NSLog(@"SPHERICAL--- RADIUS: %.2f -- THETA: %.2f -- PHI: %.2f", radius, theta, phi);
radius = radius - 1.0;
float newX = radius * cos(theta) * sin(phi);
float newY = radius * sin(theta) * sin(phi);
float newZ = radius * cos(phi);
current = (ThreeAxisAcceleration){newX, newY, newZ};
//end math
NSValue *arrayVal = [NSValue value:¤t withObjCType:@encode(ThreeAxisAcceleration)];
if ([_dataHistoryBuffer count] > self.bounds.size.width) {
[_dataHistoryBuffer removeObjectAtIndex:0];
}
[_dataHistoryBuffer addObject:arrayVal];
[self setNeedsDisplay];
Somehow, the addition of the gravity correction is gradually slowing my code horrendously. I find it hard to believe that this amount of math can slow down the program, but yet without it it can still run through my entire display method which is quite lengthy. Are there any options I can consider here to avoid this? Am I missing something or is the math just that slow? I can comment out between the //math and //end math tags and be just fine.
Thanks for any help.
P.S. incase it may matter, to whom it may interest, I'm programming in cocoa, and this method belongs to a subclass of CALayer, with -dr开发者_运维问答awInContext: implemented.
Are you on iPhone? Try using the float variants of these functions: powf, sqrtf, etc
There's more info in point #4 of Kendall Helmstetter Gelner's answer to this SO question.
Besides the fact that it's theoretically impossible to simply factor out Earth's gravity, the first step I would take would be to benchmark each of the operations that you're performing (multiplication, division, sin, atan2, etc) and then engineer a way around the operations that take significantly longer to compute (or avoid computing the problematic operations). Make sure to use the same data types in your benchmarking as you will in your finished product.
This is a classic example of the time/accuracy trade-off. There are usually multiple algorithms for performing the same computation and you also have LUTs/interpolation at your disposal.
I ran into the same issues when I made my own Wii-style remote controller. If you identify the expensive operation and are having trouble engineering around it then start another question. :)
The normal way to shorten a vector would be along the lines of:
float originalMagnitude = sqrtf(current.x * current.x, current.y * current.y, current.z* current.z);
float desiredMagnitude = originalMagnitude - 1.0f;
float scaleFactor = (originalMagnitude != 0) ? desiredMagnitude / originalMagnitude : 0.0f; // avoid divide-by-zero
current.x *= scaleFactor;
current.y *= scaleFactor;
current.z *= scaleFactor;
That said, no, calling a few trig functions 33 times a second shouldn’t be slowing you down much. On the other hand, -[NSMutableArray removeObjectAtIndex:]
could potentially be slow for a big array. A ring buffer (either using NSMutableArray
or a C array of structs) would be more efficient.
- Profile, don't speculate. Don't change a damn thing until you know what to change.
Assuming that you get a profile that shows that all the math really is slowing you down:
- Don't ever write
pow(someFloat,2)
. The compiler should be able to optimize this away for you, but often times, on newer hardware, those optimizations may not yet be in place. This should always be written someFloat*someFloat. The pow( ) function is generally the most expensive function in the math library. Simple multiplication will always be at least as fast as calling pow( ), and will always be at least as accurate (assuming IEEE-754 compliant arithmetic). Plus, it's easier for the compiler to optimize. - When working with
float
s in C, use the suffixed forms of the math library function.sinf
is faster thansin
.sqrtf
is faster thansqrt
. Beyond the functions themselves being faster, you avoid unnecessary conversions to and fromdouble
. - If you're seeing the slowdown on a ARMv6 processor (not the 3GS or the new iPod Touch), make sure you are not compiling to thumb code when you are doing a lot of floating-point computation. The thumb instruction set (prior to thumb2) cannot access the VFP registers, and thus needs a shim for every floating point operation. This can be quite expensive.
- If you just want to decrease the length of the acceleration vector by 1.0 (hint: this doesn't do what you want), there are more efficient algorithms to do so.
Those math lines look fine. I don't know enough Objective C to know what the current = ... line is doing though. Could it be allocating memory on the heap which isn't being reclaimed? What happens if you just comment it out? Have you watched the processes' execution with top, to see if it starts slurping more CPU or memory?
Other than the other commentor's use of the floating point (as opposed to double operators), doing all that _dataHistoryBuffer stuff will be what's killing your app. That'll churn up the memory like there's no tomorrow, and since you are using the NSValue, then all those objects will be added to the autorelease pool making memory consumption spike. You're much better off avoiding keeping a list of values unless you really, really need it - and if so, figuring out a more appropriate (read: fixed size, non-object) mechanism to store them in. Even a circular buffer of structs (e.g. an array of 10 structs, and then have a counter which does i++ % 10 to index into it) would be better.
Profile it to see exactly where the problem is. If necessary, comment out a subset of the "math" part at a time. Performance is something people usually guess wrong, even smart, thoughtful, experienced people.
Just out of interest - do you know how the Math SQRT function is implemented? If it is using an inefficient approximation algorithm, then it might be your culprit. Your best option is to create some sort of test harness that can get an average performance for each of the instructions that you are using.
Another question - does increasing or reducing the precision of the operators (i.e. by using double value floats rather than singles) change the performance in any way?
As others have said, you should profile to be sure. Having said that, yes, it is quite likely that adding the extra calculations did slow it down.
By default, all code for iPhone is compiled for the Thumb-1 instruction set. Thumb-1 does not have native floating point support, so it ends up calling out to a SOFTWARE floating point implementation. There are 2 ways to handle this:
- Compile the code for ARM. The processor in the iPhone can freely intermix Thumb and ARM code, so you can just compile the the necessary pieces as ARM. You should note that GCC (and by proxy Xcode) cannot compile an individual function as ARM, you will need to isolate all the relevent code into its on compilation unit. It is probably easiest just to set the entire project to compile for ARM to see if it fixes things (Uncheck "Build Options" > "Compile for Thumb"). You should note that while ARM will speed up floating point, it reduces instruction density thereby hurting cache efficiency and degrading all of your other code, so try to avoid it where you can.
- Compile for Thumb-2. Thumb-2 is an enhanced version of Thumb that adds support for some floating point operations. It is only available on iPhone 3GS and the new iPod Touch, so this may not be an option for you. You can do that by switching your architecture to "Optimized," which will build a fat binary with current slow version for older devices, and the faster version for ones that support.
You can also combine both of these options, if that seems like the best choice.
Unless I misunderstand your code, you basically scale your point by some factor. I think the following code should be equivalent to what you do.
double radius = sqrt(current.x * current.x
+ current.y * current.y
+ current.z * current.z);
double newRadius = radius - 1.0;
double scale = newRadius/radius;
current.x *= scale;
current.y *= scale;
current.z *= scale;
This method will find out what the problem is. The worse your slowdown is, the quicker it will find it. Guesses are things that you suspect but don't know, such as thinking the math is the problem. Guesses are usually wrong, at least to begin with. If you are right, the samples will show you. If you are wrong, they will show you what is right. It never misses.
My guess is since you're using autoreleased memory (for NSValue) every 0.03 seconds you're probably not giving the pool much time to release itself. I could be wrong - profiling is the only way to tell.
Try manually allocating and releasing the NSValue and see if it makes a difference.
精彩评论