开发者

How can I write a regex that matches words that overlap themselves?

I'm trying to match a word forwards and backwards in a string but it isn't catching all matches. For example, searching for the word "AB" in the string "AAABAAABAAA", I create and use the regex /AB|BA/, but it only matches the two "AB" substrings, and ignores the "BA" substrings.

I'm using RegexKitLite on the iPhone, but I think this is a more general regex problem (I see the same behavior in online regex testers). Nevertheless, here's the code I'm using to enumerate the matches:

[@"AAABAAABAAA" enumerateStrin开发者_JS百科gsMatchedByRegex:@"AB|BA" usingBlock:
 ^(NSInteger captureCount,
   NSString * const capturedStrings[captureCount],
   const NSRange capturedRanges[captureCount],
   volatile BOOL * const stop) { 
     NSLog(@"%@", capturedStrings[0]);
 }];

Output:

AB
AB


I don't know which online tester you tried, but http://www.regextester.com/ (for example) will not consider the same character for multiple matches. In this case, since ABA matches AB, the B is not considered for the BA match. It's purely a guess that RegexKitLite is implemented similarly.

Even if you don't consider the mirrored variant, the original search string may overlap with itself. For example, if you search ABCA|ACBA in ABCABCACBACBA you'll get two of four matches, searching in both directions will be the same.

It should be possible to find matches incrementally, but perhaps not with RegexKitLite


I would say, thats not possible in one turn. The regex matches for the given pattern and "eats" the matched characters. So if you search AB|BA in ABA the first found pattern is AB, then the regex continue to search on the third A.

So it is not possible to find overlapping patterns with the same regex and using the | operator.


I'm not sure how you'd accomplish exactly what I think you're asking for without reversing the string and testing twice.

However, I suppose it depends on what you're after exactly. If you're simply trying to determine if the pattern occurs in the string backwards or forwards, and not so much how it occurs, then you could do something like this:

ABA?|BAB?

The ? makes the last character optional on each side of the |. In the case of AAABAAABAAA, it'll find ABA twice. In the case of AB it'll find AB, and in the case of BA it'll find BA.

Here it is with test cases... http://regexhero.net/tester/?id=a387ae0a-1707-4d9e-856b-ebe2176679bb

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜