开发者

Match names in the format of "A - B", but NOT in the format of "A - B: Total goals - odd or even"

Hi I met a problem when processing the match names of sports, the principle is this:

  1. match strings with format of "A - B"
  2. NOT match strings with any ":" after "A - B", eg "A - B: Total goals - odd or even"

Here is my Regex:

^.+(\s+-\s+)([^:\n]+)(?!:[\w\s]+)$


And here are some example strings Should Match:

Mattek-Sands Bethanie - Safarova Lucie

L. Hewitt - O. Rochus

Ball Carsten - S. Darcis

Poland - Austria

Poland - Austria 1x2

Poland - Austria 1 x 2

Poland - Austria 1x2

Poland - Austria - 1x2

Poland - Austria _ 1x2

Poland - Austria (1x2)


Here are some example strings Should NOT Match:

Vityaz Podolsk Chekhov - Traktor Chel: Total goals - odd or even

Haka - JJK: Half time

Lyngby - AaB: Draw No Bet

AC Horsens - FC Midtjylland: First team to score

Mattek-Sands Bethanie - Safarova Lucie: Who will win set number 1?

Czech Republic - Kazakhstan: 1x2

Romania - Slovak Republic: 1x2

Norway - Moldova: 1x2

Yushin Okami - Mark Munoz<BR/><span>UFC on VERSUS 开发者_StackOverflow社区2</span>: 1x2

Thiago Alves - Jon Fitch<BR/><span>UFC 117 - Oakland</span>: 1x2

Poland - Austria: 1x2

Poland - Austria: 1 x 2


BUT the problem is my regex MATCHES the first string in the Shoud NOT Match category:

"Vityaz Podolsk Chekhov - Traktor Chel: Total goals - odd or even"

And if I delete the "-" after the ":", it will NOT MATCH any more, which is good.

I think the problem might be the (\s+-\s+) part in the Regex, but I couldn't actually figure out how to fix it.

Would anyone one help? Thx!


You can just remove (?!:[\w\s]+) and use:

^.+?(\s+-\s+)([^:\n]+)$

After the -, this will match every character that isn't a : and guarantee that it matches all the way to the end of the string/line.


I suggest

^([^:]+)\s+-\s+([^:]+)$

This matches a string that contains

  1. any number of characters except :, followed by
  2. whitespace, -, whitespace, followed by
  3. any number of characters except :.

The ^ and $ anchors make sure that the entire string is matched. A string that contains a : can thus never match, and thus the regex will fail in all your negative example cases and match all you positive example cases.

I have also enclosed the first and second parts of the match in capturing parentheses in case you want to do something with them later; and I have removed the unnecessary parens around the \s+-\s+ bit.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜