开发者

Trying to split a very large string [duplicate]

This question already has answers here: Closed 11 years ago.

Possible Duplicate:

Most memory efficient way to split an NSString in to substrings

I'm trying to split a 20Mb string. I've tried using componentsSeparatedByString but it consumes too much RAM. I think that this is down to the fact that it splits the string but also leaves the original string intact. This means that the string is effectivly stored in memory twice (even if I release the original string right after the split it is still an issue.)

I'm very new to Objective C. I've tried to write some co开发者_JAVA技巧de that removes the substring from the original string as it adds it to the array of found strings. The idea is that as the mutable array of found strings gets larger the original string gets smaller. The only problem is that it leaks memory and crashes. If someone could tell me what I'm doing wrong then that yould be great!

    NSRange range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];
    int counter = 1;

    // locations will == int max if it can't find any more occurances
    while (range.location < [mainHtml length]) {
        NSString *curStr;
        NSRange curStrRange;

        NSRange rangeToSearchIn = NSMakeRange(range.location+1, [mainHtml length] - range.location - 1);
        NSRange nextRange = [mainHtml rangeOfString:@"<p class=NumberedParagraph>" options:NSCaseInsensitiveSearch range:rangeToSearchIn];

        if (nextRange.location > [mainHtml length])
        {
            // This is the last string - get everything up to the end of the file
            curStrRange = NSMakeRange(0, [mainHtml length]);
            curStr = [mainHtml substringFromIndex:range.location];
        } else {
            curStrRange = NSMakeRange(range.location, nextRange.location - range.location);
            curStr = [mainHtml substringWithRange:curStrRange];
        }

        // Remove the substring just processed from the orignal string
        // * it crashes here, normally on the 3rd itteration
        mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];
        range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];

        [self.parts addObject:curStr];
    }


I think that @babbidi had the correct idea. mainHtml is large and you have many autoreleased copies of it around (one copy for each iteration) that are not being released. Try adding the following @autorelease in your code to release all the autoreleased objects at the end of each loop. If you are not using Mac OS X 10.7 then you need only create the autorelease pool manually outside the main loop and drain it once per iteration.

NSRange range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];
int counter = 1;

// locations will == int max if it can't find any more occurances
while (range.location < [mainHtml length]) {
    @autorelease {
        NSString *curStr;
        NSRange curStrRange;

        NSRange rangeToSearchIn = NSMakeRange(range.location+1, [mainHtml length] - range.location - 1);
        NSRange nextRange = [mainHtml rangeOfString:@"<p class=NumberedParagraph>" options:NSCaseInsensitiveSearch range:rangeToSearchIn];

        if (nextRange.location > [mainHtml length])
        {
            // This is the last string - get everything up to the end of the file
            curStrRange = NSMakeRange(0, [mainHtml length]);
            curStr = [mainHtml substringFromIndex:range.location];
        } else {
            curStrRange = NSMakeRange(range.location, nextRange.location - range.location);
            curStr = [mainHtml substringWithRange:curStrRange];
        }

        // Remove the substring just processed from the orignal string
        // * it crashes here, normally on the 3rd itteration
        mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];
        range = [mainHtml rangeOfString:@"<p class=NumberedParagraph>"];

        [self.parts addObject:curStr];
    }
}


I don't believe you have any leaks. substringFromIndex: returns an autoreleased string, so it might be kept in memory for more then one iteration. You could create your own substringFromIndex: method (e.g: createSubstringFromIndex) which will return a string retained string which you can manually release.

+(NSString *)createSubstringFromIndex:(NSUInteger)index string:(NSString *)string{
    int newLen = [string length] - index;
    if(newLen<=0)
        return @"";   // or nil
    char *cStr = malloc(newLen+1);
    for(int i=index; i<[string length]; i++){
        cStr[i-index]=[string characterAtIndex:i];
    }
    cStr[newLen]='\0';
    NSString *retStr = [[NSString alloc] initWithCString:cStr encoding:NSASCIIStringEncoding];
    free(cStr);
    return retStr;
}

in your code you'd have to replace this:

mainHtml = [mainHtml substringFromIndex:curStrRange.location + curStrRange.length];

with this:

NSString *newHtmlString = [[self class] createSubstringFromIndex:curStrRange.location + curStrRange.length string:mainHtml];
[mainHtml release];                ///mainHtml should be retained before the while loop starts
mainHtml = newHtmlString;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜