How do I limit memory use of CHCSVParser?
I'm trying to import a 30mb CSV file into Core Data using the CHCSVParser from https://github.com/davedelong/CHCSVParser
It works, it was quite easy to setup, but it eats up a lot of memory as it's parsing through the file. The excessive memory usage seems to be coming from the end of -nextCharacter
, in particular, the call to -substringWithRange:
//return nil to indicate EOF or error
if ([currentChunk length] == 0) { return nil; }
NSRange charRange = [currentChunk rangeOfComposedCharacterSequenceAtIndex:chunkIndex];
NSString * nextChar = [currentChunk substringWithRange:charRange];
ch开发者_JS百科unkIndex = charRange.location + charRange.length;
return nextChar;
I was able to add an autorelease pool to the function that calls -drain
every 1,000,000 characters, but then the throughput goes way down.
Does anyone have any other ideas? Dave DeLong perhaps? :-)
OK, so I checked things out and you're right, there is pretty blatant memory buildup.
I tried putting in a pool every time it began a new CSV line and then draining it when the line was done, but that proved to be ineffective with some other memory management situations.
What I ended up doing was putting a pool in the -runParseLoop
method. The pool is alloc
'd right before the while loop and drained right after. There's an unsigned short
counter that gets incremented in the loop, and within the loop, I -drain
and re-alloc the pool if the counter ever hits 0.
Essentially:
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
unsigned short counter = 0;
while (error == nil &&
(currentCharacter = [self nextCharacter]) &&
currentCharacter != nil) {
//process the current character
counter++;
if (counter == 0) { //this happens every 65,536 (2**16) iterations when the unsigned short overflows
//retain the characters that need to out-live this pool
[pool drain];
pool = [[NSAutoreleasePool alloc] init];
//autorelease the characters
}
}
[pool drain];
That's a fun exploitation of overflow, eh? :)
I tested this against a 190MB CSV file, and memory usage stayed at reasonable levels (a couple of megabytes of active memory).
These changes have been pushed to the master branch on the github page. Try them, and let me know how they work for you. If you're still having memory/performance issues, come back and we can try something else.
精彩评论