Match names in the format of "A - B", but NOT in the format of "A - B: Total goals - odd or even"
Hi I met a problem when processing the match names of sports, the principle is this:
- match strings with format of "A - B"
- NOT match strings with any ":" after "A - B", eg "A - B: Total goals - odd or even"
Here is my Regex:
^.+(\s+-\s+)([^:\n]+)(?!:[\w\s]+)$And here are some example strings Should Match:
Mattek-Sands Bethanie - Safarova Lucie
L. Hewitt - O. Rochus
Ball Carsten - S. Darcis
Poland - Austria
Poland - Austria 1x2
Poland - Austria 1 x 2
Poland - Austria 1x2
Poland - Austria - 1x2
Poland - Austria _ 1x2
Poland - Austria (1x2)
Here are some example strings Should NOT Match:
Vityaz Podolsk Chekhov - Traktor Chel: Total goals - odd or even
Haka - JJK: Half time
Lyngby - AaB: Draw No Bet
AC Horsens - FC Midtjylland: First team to score
Mattek-Sands Bethanie - Safarova Lucie: Who will win set number 1?
Czech Republic - Kazakhstan: 1x2
Romania - Slovak Republic: 1x2
Norway - Moldova: 1x2
Yushin Okami - Mark Munoz<BR/><span>UFC on VERSUS 开发者_StackOverflow社区2</span>: 1x2
Thiago Alves - Jon Fitch<BR/><span>UFC 117 - Oakland</span>: 1x2
Poland - Austria: 1x2
Poland - Austria: 1 x 2
BUT the problem is my regex MATCHES the first string in the Shoud NOT Match category:
"Vityaz Podolsk Chekhov - Traktor Chel: Total goals - odd or even"
And if I delete the "-" after the ":", it will NOT MATCH any more, which is good.
I think the problem might be the (\s+-\s+) part in the Regex, but I couldn't actually figure out how to fix it.
Would anyone one help? Thx!
You can just remove (?!:[\w\s]+)
and use:
^.+?(\s+-\s+)([^:\n]+)$
After the -
, this will match every character that isn't a :
and guarantee that it matches all the way to the end of the string/line.
I suggest
^([^:]+)\s+-\s+([^:]+)$
This matches a string that contains
- any number of characters except
:
, followed by - whitespace,
-
, whitespace, followed by - any number of characters except
:
.
The ^
and $
anchors make sure that the entire string is matched. A string that contains a :
can thus never match, and thus the regex will fail in all your negative example cases and match all you positive example cases.
I have also enclosed the first and second parts of the match in capturing parentheses in case you want to do something with them later; and I have removed the unnecessary parens around the \s+-\s+
bit.
精彩评论