How can NSArray be this slow?
I'm coming from a C++/STL world and I wanted to check how objective-c containers are in comparison to stl.
I wanted to compare an array of numbers but the only way to add a number to an NSArray
is using NSNumber
which is utterly slow and drank my ram empty so I guess I need to dealloc them manually. But I don't want to test side effects so I just added [NSNull null]
into the array.
The results of adding 10k things into array 1k times:
NSArray
- 0.923411 seconds
vector<int>
- 0.129984 seconds
I thought it might be allocations and deallocations so I set the number of arrays(imax
in the code) to 1 and number of additions to 10000000(jmax
) but it was even slower
NSArray
- 2.19859 seconds
vector<int>
- 0.223471 seconds
Edit:
As mentioned in the comments the constant increasing size of the array might be the problem so I madeNSArray
using arrayWithCapacity
, but vector
with reserve
too and it was even slower tha开发者_运维问答n before(!) (imax
= 1, jmax
= 10000000).
NSArray
- 2.55942
vector<int>
- 0.19139
End edit
Why is this so slow?
My code for reference:
#import <Foundation/Foundation.h>
#include <vector>
#include <iostream>
#include <time.h>
using namespace std;
int main (int argc, const char * argv[])
{
int imax = 1000;
int jmax = 10000;
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
cout << "Vector insertions" << endl;
clock_t start = clock();
for(int i = 0; i < imax; i++)
{
vector<int> *v = new vector<int>();
for(int j = 0; j < jmax; j++)
{
v->push_back(j);
}
delete v;
}
double interval = (clock() - start) / (double)CLOCKS_PER_SEC;
cout << interval << " seconds" << endl;
cout << "NSArray insertions" << endl;
start = clock();
for(int i = 0; i < imax; i++)
{
NSMutableArray *v = [[NSMutableArray alloc] init];
for(int j = 0; j < jmax; j++)
{
[v addObject:[NSNull null]];
}
[v dealloc];
}
interval = (clock() - start) / (double)CLOCKS_PER_SEC;
cout << interval << " seconds" << endl;
[pool drain];
return 0;
}
@JeremyP provides an excellent link and information. Always read the fish. Here's some breakdown of what's eating time, though, and what you might do about it.
First, there's the many calls to objc_msgSend()
for dynamic dispatch. These can be avoided and you'll save some of the time (though not as much as you'd think. objc_msgSend()
is crazy optimized). But you'll knock maybe 5% off by skipping it:
IMP addObject = class_getMethodImplementation([NSMutableArray class], @selector(addObject:));
NSNull *null = [NSNull null];
start = clock();
for(int i = 0; i < imax; i++)
{
NSMutableArray *v = [[NSMutableArray alloc] init];
for(int j = 0; j < jmax; j++)
{
addObject(v, @selector(addObject:), null);
}
[v release];
}
A lot of time is eaten up with retain
/release
. You can avoid that (and stick real numbers in rather than NSNumber
) by using a non-retaining CFMutableArray
). This will get the append times to about 2x of vector
.
CFArrayCallBacks cb = {0};
for(int i = 0; i < imax; i++)
{
CFMutableArrayRef v = CFArrayCreateMutable(NULL, 0, &cb);
for(int j = 0; j < jmax; j++)
{
CFArrayAppendValue(v, &j);
}
CFRelease(v);
}
The biggest cost of this one is the calls to memmove()
(or the collectable version of it on the Mac).
Man, NSMutableArray
sure is slow. How could Apple be so stupid, right? I mean, really... wait... I wonder if there's something NSMutableArray
does better than vector
?
Try swapping out these lines for their obvious counterparts:
v->insert(v->begin(), j);
NSNumber *num = [[NSNumber alloc] initWithInt:j];
[v insertObject:num atIndex:0];
[num release];
(Yes, including creating and releasing the NSNumber
, not just using NSNull
.)
Oh, and you might try this one too to see just how fast NSMutableArray
and CFMutableArray
really can be:
CFArrayInsertValueAtIndex(v, 0, &j);
In my tests I get:
Vector insertions
7.83188 seconds
NSArray insertions
2.66572 seconds
Non-retaining
0.310126 seconds
The short answer: Yes, NSArray really is quite a bit slower than C++'s STL collection classes. This has much to do with compile time vs. runtime behaviors, optimization opportunities on the part of the compiler, and numerous implementation details.
(And, as Rob points out, NSMutableArray is optimized for random insertion and performs better than C++ for that...)
The real answer:
Micro-benchmarks are useless for optimizing user facing applications.
Using a micro-benchmark to make implementation decisions is the very definition of premature optimization.
You would be hard pressed to find an Objective-C app targeted to iOS or Mac OS X where CPU profiling would show any significant time spent in code paths related to NSArray, yet the vast majority of those apps use the NS* collection classes pretty much exclusively.
Certainly, there are cases where the performance of NS* aren't viable and, for that, you turn to C++/STL.
None of this is to imply that your question is invalid. Without more context, it is difficult to say if the observed performance difference really matters (however, in my experience, just about every time a developer has asked a question based on a micro-benchmark, it has been misguided).
Oh -- and read this as it gives a bit of insight into the implementation of *Array.
It's a fully fledged Objective-C object which means there is an overhead each time you add an object due to Cocoa's message lookup algorithm which is necessary to implement properly dynamic binding.
There's also the point that NSArrays are not necessarily internally structured as a contiguous set of pointers. For very large arrays, NSArray performs much better (i.e. has much better big O time complexity) than C++ vectors. Have a read of this the definitive Ridiculous Fish Blog on the topic.
At least some of the time is consumed in repeatedly increasing the capacity of the NSArray. It should be faster to initialize the NSArray to the right (or at least a better) capacity initially with:
[NSMutableArray arrayWithCapacity:10000];
#include <stdio.h>
#include <time.h>
int main (int argc, char **argv)
{
int imax = 1000;
int jmax = 10000;
clock_t start = clock();
for(int i = 0; i < imax; i++)
{
int array[jmax];
for(int j = 0; j < jmax; j++)
j[array] = 0;
}
double interval = (clock() - start) / (double)CLOCKS_PER_SEC;
printf("%f\n", interval);
return 0;
}
Output in my 2GHz Core2Duo iMac (compiled with LLVM):
0.000003
精彩评论