开发者

Best cocoa data structure for list of files

I'm writing a program that processes hundreds of files when run. 开发者_运维技巧Right now each file and folder is stored in an object I created (it contains the filepath, filetype, filesize, a pointer to an offset in the file, and if it is a directory), and those objects are placed in a NSMutableArray. A big problem with this is at the end of processing all the files, I need to get statistics for all the files in each folder. I am doing this using 2 nested for loops, and the performance is terrible.

My question is this: Is there a more efficient way to store a list of files and folders in cocoa (other than NSMutableArray, sets, etc) so I can quickly access all folders and all objects inside those folders? Is there some structure that will create an array of folders and and array of files and folders located in that parent folder?


Right now each file and folder is stored in an object I created (it contains the filepath, filetype, filesize, a pointer to an offset in the file, and if it is a directory), and those objects are placed in a NSMutableArray.

That's the correct solution. C arrays are trickier, since you'd have to handle size management yourself and you don't get bounds-checking.

A big problem with this is at the end of processing all the files, I need to get statistics for all the files in each folder. I am doing this using 2 nested for loops, and the performance is terrible.

Have you profiled using Shark and/or Instruments? That's the first thing you should check, if you haven't already. The bottleneck may not be where you think it is. Stop reading this answer (and any other answers) until you have profiled.


If you're currently blocking the main thread with this task, consider using NSOperationQueue instead. For each item in the top level, if it's a file, add an operation that examines the file, and if it's a directory, add an operation that will do the same iteration upon the directory's contents. If you can require Snow Leopard, you'll find blocks handy here, since you won't have to explicitly tell the directory-inventory operation which queue to add the examine-file operations to.

You should probably put a lid on the number of operations the queue will run at once, lest you end up running too many them. Mike Ash has details (that post is about GCD, but as of Snow Leopard, NSOperationQueue is based on GCD).

Assuming that you're displaying a running total in your UI, you can use the main queue to hold (possibly block-based) operations that add new information to the totals. If you're supporting Leopard, you can make your own “main” queue, but you'll have to make the operations run on the main thread yourself.

By the way, if you're totaling file sizes, you should consider whether want to uniquify on inode. If I hard-link a 200 MiB file into three other places, you'll see four files, but they're all really the same file, so they only take up 200 MiB, not 800.


You might also want to consider a tree-like structure. You have a root node that corresponds to the filepath "/". Then root has many children, each for "/System", "/etc", "/Library", "/Users", etc.

When you add a node into this tree, you can have it percolate up and add the filesize of the new node to the parents (so that the tree will always have the correct volume size in the root node). Or you can have it compute the size as you need (recursively, most likely) and return that.

As for retrieving the paths in the first place, you've probably found NSFileManager. You should also take a look at NSDirectoryEnumerator and the lower-level FSGetCatalogInfoBulk.


You could use C arrays instead of Cocoa collections. This could be faster.
Some links with performance comparisons:

  • NSArray vs. C Array performance comparison
  • NSArray or NSSet, NSDictionary or NSMapTable
  • ridiculousfish: Array


Use NSMutableDictionary with file's directory as a key, and NSMutableArray of files as object. You'll be able to iterate directories quickly.

You can also split directory using [NSString pathComponents] and use dictionary of dictionaries to hold each part of the path (a tree). You can even mix files and dictionaries in the tree and use [foo isKindOfClass:[NSDictionary class]] to tell them apart.

Here's JSON version of what I'm talking about (which translates nicely to Cocoa classes):

/foo/bar/bazfile & /foo/quzfile =

{"foo": {
   "bar": {
      "bazfile": fileinfo
   },
   "quzfile": fileinfo
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜