Need Regex for to match special situations
I'm desperately searching for regular expressions that match these scenarios:
1) Match alternating char开发者_C百科s
I've a string like "This is my foobababababaf string" - and I want to match "babababa"
Only thing I know is the length of the fragment to search - I don't know what chars/digits that might be - but they are alternating.
I've really no clue where to start :(
2) Match combined groups
In a string like "This is my foobaafoobaaaooo string" - and I want to match "aaaooo". Like in 1) I don't know what chars/digits that might be. I only know that they will appear in two groups.
I experimented using (.)\1\1\1(.)\1\1\1 and things like this...
I think something like this is what you want.
For alternating characters:
(?=(.)(?!\1)(.))(?:\1\2){2,}
\0
will be the entire alternating sequence, \1
and \2
are the two (distinct) alternating characters.
For run of N and M characters, possibly separated by other characters (replace N
and M
with numbers here):
(?=(.))\1{N}.*?(?=(?!\1)(.))\2{M}
\0
will be entire match, including infix. \1
is the character repeated (at least) N
times, \2
is the character repeated (at least) M
times.
Here's a test harness in Java.
import java.util.regex.*;
public class Regex3 {
static String runNrunM(int N, int M) {
return "(?=(.))\\1{N}.*?(?=(?!\\1)(.))\\2{M}"
.replace("N", String.valueOf(N))
.replace("M", String.valueOf(M));
}
static void dumpMatches(String text, String pattern) {
Matcher m = Pattern.compile(pattern).matcher(text);
System.out.println(text + " <- " + pattern);
while (m.find()) {
System.out.println(" match");
for (int g = 0; g <= m.groupCount(); g++) {
System.out.format(" %d: [%s]%n", g, m.group(g));
}
}
}
public static void main(String[] args) {
String[] tests = {
"foobababababaf foobaafoobaaaooo",
"xxyyyy axxayyyya zzzzzzzzzzzzzz"
};
for (String test : tests) {
dumpMatches(test, "(?=(.)(?!\\1)(.))(?:\\1\\2){2,}");
}
for (String test : tests) {
dumpMatches(test, runNrunM(3, 3));
}
for (String test : tests) {
dumpMatches(test, runNrunM(2, 4));
}
}
}
This produces the following output:
foobababababaf foobaafoobaaaooo <- (?=(.)(?!\1)(.))(?:\1\2){2,}
match
0: [bababababa]
1: [b]
2: [a]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.)(?!\1)(.))(?:\1\2){2,}
foobababababaf foobaafoobaaaooo <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
match
0: [aaaooo]
1: [a]
2: [o]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
match
0: [yyyy axxayyyya zzz]
1: [y]
2: [z]
foobababababaf foobaafoobaaaooo <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
match
0: [xxyyyy]
1: [x]
2: [y]
match
0: [xxayyyy]
1: [x]
2: [y]
Explanation
(?=(.)(?!\1)(.))(?:\1\2){2,}
has two parts(?=(.)(?!\1)(.))
establishes\1
and\2
using lookahead- Nested negative lookahead ensures that
\1
!=\2
- Using lookahead to capture lets
\0
have the entire match (instead of just the "tail" end)
- Nested negative lookahead ensures that
(?:\1\2){2,}
captures the\1\2
sequence, which must repeat at least twice.
(?=(.))\1{N}.*?(?=(?!\1)(.))\2{M}
has three parts(?=(.))\1{N}
captures\1
in a lookahead, and then match itN
times- Using lookahead to capture means the repetition can be
N
instead ofN-1
- Using lookahead to capture means the repetition can be
.*?
allows an infix to separate the two runs, reluctant to keep it as short as possible(?=(?!\1)(.))\2{M}
- Similar to first part
- Nested negative lookahead ensures that
\1
!=\2
The run regex will match longer runs, e.g. run(2,2)
matches "xxxyyy"
:
xxxyyy <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{2}
match
0: [xxxyy]
1: [x]
2: [y]
Also, it does not allow overlapping matches. That is, there is only one run(2,3)
in "xx11yyy222"
.
xx11yyy222 <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{3}
match
0: [xx11yyy]
1: [x]
2: [y]
Assuming that you use perl/PCRE:
(.{2})\1+
or((.)(?!\2)(.))\1+
. Second regex prevents matching things likeoooo
.
UPD: Then 2. will be ((.)\2{N}).*?((?!\2)(.)\4{M})
. Remove (?!\2)
if you want to get matches like oooaoooo
and replace N and M with n-1 and m-1.
Well, this works for the first one...
((.)(.))(\2\3)+
Examples in javascript
a = "This is my foobababababaf string"
console.log(a.replace(/(.)(.)(\1\2)+/, "<<$&>>"))
a = "This is my foobaafoobaaaooo string"
console.log(a.replace(/(.)\1+(.)\2+/, "<<$&>>"))
精彩评论