Why does NSRegularExpression not honor capture groups in all cases?
Main problem: ObjC can tell me there were six matches when my pattern is, @"\\b(\\S+)\\b"
, but when my pattern is @"A b (c) or (d)"
, it only reports one match, "c"
.
Solution
Here's a function which returns the capture groups as an NSArray. I'm an Objective C newbie so I suspect there are better ways to do the clunky work than by creating a mutable array and assigning it at the end to an NSArray.
- (NSArray *)regexWithResults:(NSString *)haystack pattern:(NSString *)strPattern
{
NSArray *ar;
ar = [[NSArray alloc] init];
NSError *error = NULL;
NSArray *arTextCheckingResults;
NSMutableArray *arMutable = [[NSMutableArray alloc] init];
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:strPattern
options:NSRegularExpressionSearch error:&error];
arTextCheckingResults = [regex matchesInString:haystack
options:0
range:NSMakeRange(0, [haystack length])];
for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
int captureIndex;
for (captureIndex = 1; captureIndex < ntcr.numberOfRanges; captureIndex++) {
NSString * capture = [haystack substringWithRange:[ntcr rangeAtIndex:captureIndex]];
//NSLog(@"Found '%@'", capture);
[arMutable addObject:capture];
}
}
ar = arMutable;
return ar;
}
Problem
I am accustomed to using parentheses to match capture groups in Perl in a manner like this:
#!/usr/bin/perl -w
use strict;
my $str = "This sentence has words in it.";
if(my ($what, $inner) = ($str =~ /This (\S+) has (\S+) in it/)) {
print "That $what had '$inner' in it.\n";
}
That code will produce:
That sentence had 'words' in it.
But in Objective C, with NSRegularExpression, we get different results. Sample function:
- (void)regexTest:(NSString *)haystack pattern:(NSString *)strPattern
{
NSError *error = NULL;
开发者_运维知识库NSArray *arTextCheckingResults;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:strPattern
options:NSRegularExpressionSearch
error:&error];
NSUInteger numberOfMatches = [regex numberOfMatchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];
NSLog(@"Pattern: '%@'", strPattern);
NSLog(@"Search text: '%@'", haystack);
NSLog(@"Number of matches: %lu", numberOfMatches);
arTextCheckingResults = [regex matchesInString:haystack options:0 range:NSMakeRange(0, [haystack length])];
for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
NSString *match = [haystack substringWithRange:[ntcr rangeAtIndex:1]];
NSLog(@"Found string '%@'", match);
}
}
Calls to that test function, and the results show it is able to count the number of words in the string:
NSString *searchText = @"This sentence has words in it.";
[myClass regexTest:searchText pattern:@"\\b(\\S+)\\b"];
Pattern: '\b(\S+)\b' Search text: 'This sentence has words in it.' Number of matches: 6 Found string 'This' Found string 'sentence' Found string 'has' Found string 'words' Found string 'in' Found string 'it'
But what if the capture groups are explicit, like so?
[myClass regexTest:searchText pattern:@".*This (sentence) has (words) in it.*"];
Result:
Pattern: '.*This (sentence) has (words) in it.*' Search text: 'This sentence has words in it.' Number of matches: 1 Found string 'sentence'
Same as above, but with \S+ instead of the actual words:
[myClass regexTest:searchText pattern:@".*This (\\S+) has (\\S+) in it.*"];
Result:
Pattern: '.*This (\S+) has (\S+) in it.*' Search text: 'This sentence has words in it.' Number of matches: 1 Found string 'sentence'
How about a wildcard in the middle?
[myClass regexTest:searchText pattern:@"^This (\\S+) .* (\\S+) in it.$"];
Result:
Pattern: '^This (\S+) .* (\S+) in it.$' Search text: 'This sentence has words in it.' Number of matches: 1 Found string 'sentence'
References: NSRegularExpression NSTextCheckingResult NSRegularExpression matching options
I think if you change
// returns the range which matched the pattern
NSString *match = [haystack substringWithRange:ntcr.range];
to
// returns the range of the first capture
NSString *match = [haystack substringWithRange:[ntcr rangeAtIndex:1]];
You will get the expected result, for patterns containing a single capture.
See the doc page for NSTextCheckingResult:rangeAtIndex:
A result must have at least one range, but may optionally have more (for example, to represent regular expression capture groups).
Passing rangeAtIndex: the value 0 always returns the value of the the range property. Additional ranges, if any, will have indexes from 1 to numberOfRanges-1.
Change the NSTextCheckingResult
:
- (void)regexTest:(NSString *)haystack pattern:(NSString *)strPattern
{
NSError *error = NULL;
NSArray *arTextCheckingResults;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:strPattern
options:NSRegularExpressionSearch
error:&error];
NSRange stringRange = NSMakeRange(0, [haystack length]);
NSUInteger numberOfMatches = [regex numberOfMatchesInString:haystack
options:0 range:stringRange];
NSLog(@"Number of matches for '%@' in '%@': %u", strPattern, haystack, numberOfMatches);
arTextCheckingResults = [regex matchesInString:haystack options:NSRegularExpressionCaseInsensitive range:stringRange];
for (NSTextCheckingResult *ntcr in arTextCheckingResults) {
NSRange matchRange = [ntcr rangeAtIndex:1];
NSString *match = [haystack substringWithRange:matchRange];
NSLog(@"Found string '%@'", match);
}
}
NSLog output:
Found string 'words'
精彩评论