开发者

Why does NSRegularExpression not honor capture groups in all cases?

Main problem: ObjC can tell me there were six matches when my pattern is, @"\\b(\\S+)\\b", but when my pattern is @"A b (c) or (d)", it only reports one match, "c".

Solution

Here's a function which returns the capture groups as an NSArray. I'm an Objective C newbie so I suspect there are better ways to do the clunky work than by creating a mutable array and assigning it at the end to an NSArray.

- (NSArray *)regexWithResults:(NSString *)haystack pattern:(NSString *)strPattern
{
    NSArray *ar;
    ar = [[NSArray alloc] init];
    NSError *error = NULL;
    NSArray *arTextCheckingResults;
    NSMutableArray *arMutable = [[NSMutableArray alloc] init];
    NSRegularExpression *regex = [NSRegularExpression
        regularExpressionWithPattern:strPattern
        options:NSRegularExpressionSearch error:&error];

    arTextCheckingResults = [regex matchesInString:haystack
        options:0
        range:NSMakeRange(0, [haystack length])];

    for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
        int captureIndex;
        for (captureIndex = 1; captureIndex < ntcr.numberOfRanges; captureIndex++) {
            NSString * capture = [haystack substringWithRange:[ntcr rangeAtIndex:captureIndex]];
            //NSLog(@"Found '%@'", capture);
            [arMutable addObject:capture];
        }
    }

    ar = arMutable;
    return ar;
}

Problem

I am accustomed to using parentheses to match capture groups in Perl in a manner like this:

#!/usr/bin/perl -w
use strict;

my $str = "This sentence has words in it.";
if(my ($what, $inner) = ($str =~ /This (\S+) has (\S+) in it/)) {
    print "That $what had '$inner' in it.\n";
}

That code will produce:

    That sentence had 'words' in it.

But in Objective C, with NSRegularExpression, we get different results. Sample function:

- (void)regexTest:(NSString *)haystack pattern:(NSString *)strPattern
{
    NSError *error = NULL;
    开发者_运维知识库NSArray *arTextCheckingResults;

    NSRegularExpression *regex = [NSRegularExpression
                                  regularExpressionWithPattern:strPattern
                                  options:NSRegularExpressionSearch
                                  error:&error];

    NSUInteger numberOfMatches = [regex numberOfMatchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];

    NSLog(@"Pattern: '%@'", strPattern);
    NSLog(@"Search text: '%@'", haystack);
    NSLog(@"Number of matches: %lu", numberOfMatches);

    arTextCheckingResults = [regex matchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];

    for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
        NSString *match = [haystack substringWithRange:[ntcr rangeAtIndex:1]];
        NSLog(@"Found string '%@'", match);
    }
}

Calls to that test function, and the results show it is able to count the number of words in the string:

NSString *searchText = @"This sentence has words in it.";
[myClass regexTest:searchText pattern:@"\\b(\\S+)\\b"];
    Pattern: '\b(\S+)\b'
    Search text: 'This sentence has words in it.'
    Number of matches: 6
    Found string 'This'
    Found string 'sentence'
    Found string 'has'
    Found string 'words'
    Found string 'in'
    Found string 'it'

But what if the capture groups are explicit, like so?

[myClass regexTest:searchText pattern:@".*This (sentence) has (words) in it.*"];

Result:

    Pattern: '.*This (sentence) has (words) in it.*'
    Search text: 'This sentence has words in it.'
    Number of matches: 1
    Found string 'sentence'

Same as above, but with \S+ instead of the actual words:

[myClass regexTest:searchText pattern:@".*This (\\S+) has (\\S+) in it.*"];

Result:

    Pattern: '.*This (\S+) has (\S+) in it.*'
    Search text: 'This sentence has words in it.'
    Number of matches: 1
    Found string 'sentence'

How about a wildcard in the middle?

[myClass regexTest:searchText pattern:@"^This (\\S+) .* (\\S+) in it.$"];

Result:

    Pattern: '^This (\S+) .* (\S+) in it.$'
    Search text: 'This sentence has words in it.'
    Number of matches: 1
    Found string 'sentence'

References: NSRegularExpression NSTextCheckingResult NSRegularExpression matching options


I think if you change

// returns the range which matched the pattern
NSString *match = [haystack substringWithRange:ntcr.range];

to

// returns the range of the first capture
NSString *match = [haystack substringWithRange:[ntcr rangeAtIndex:1]];

You will get the expected result, for patterns containing a single capture.

See the doc page for NSTextCheckingResult:rangeAtIndex:

A result must have at least one range, but may optionally have more (for example, to represent regular expression capture groups).

Passing rangeAtIndex: the value 0 always returns the value of the the range property. Additional ranges, if any, will have indexes from 1 to numberOfRanges-1.


Change the NSTextCheckingResult:

- (void)regexTest:(NSString *)haystack pattern:(NSString *)strPattern
{
    NSError *error = NULL;
    NSArray *arTextCheckingResults;

    NSRegularExpression *regex = [NSRegularExpression
                                  regularExpressionWithPattern:strPattern
                                  options:NSRegularExpressionSearch
                                  error:&error];
    NSRange stringRange = NSMakeRange(0, [haystack length]);
    NSUInteger numberOfMatches = [regex numberOfMatchesInString:haystack
                                                        options:0 range:stringRange];

    NSLog(@"Number of matches for '%@' in '%@': %u", strPattern, haystack, numberOfMatches);

    arTextCheckingResults = [regex matchesInString:haystack options:NSRegularExpressionCaseInsensitive range:stringRange];

    for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
        NSRange matchRange = [ntcr rangeAtIndex:1];
        NSString *match = [haystack substringWithRange:matchRange];
        NSLog(@"Found string '%@'", match);
    }
}

NSLog output:
Found string 'words'

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜