Regular Expression to match fractions and not dates
I'm trying to come up with a regular expression that will match a fraction (1/2) but not a date (5/5/2005) within a string. Any help at all would be gre开发者_如何学JAVAat, all I've been able to come up with is (\d+)/(\d+) which finds matches in both strings. Thanks in advance for the help.
Assuming PCRE, use negative lookahead and lookbehind:
(?<![\/\d])(\d+)\/(\d+)(?![\/\d])
A lookahead (a (?=)
group) says "match this stuff if it's followed by this other stuff." The contents of the lookahead aren't matched. We negate it (the (?!)
group) so that it
doesn't match stuff after our fraction - that way, we don't match the group in what follows.
The complement to a lookahead is a lookbehind (a (?<=)
group) does the opposite - it matches stuff if it's preceeded by this other stuff, and just like the lookahead, we can negate it (the (?<!)
group) so that we can match things that don't follow something.
Together, they ensure that our fraction doesn't have other parts of fractions before or after it. It places no other arbitrary requirements on the input data. It will match the fraction 2/3
in the string "te2/3xt"
, unlike most of the other examples provided.
If your regex flavor uses //
s to delimit regular expressions, you'll have to escape the slashes in that, or use a different delimiter (Perl's m{}
would be a good choice here).
Edit: Apparently, none of these regexes work because the regex engine is backtracking and matching fewer numbers in order to satisfy the requirements of the regex. When I've been working on one regex for this long, I sit back and decide that maybe one giant regex is not the answer, and I write a function that uses a regex and a few other tools to do it for me. You've said you're using Ruby. This works for me:
>> def get_fraction(s)
>> if s =~ /(\d+)\/(\d+)(\/\d+)?/
>> if $3 == nil
>> return $1, $2
>> end
>> end
>> return nil
>> end
=> nil
>> get_fraction("1/2")
=> ["1", "2"]
>> get_fraction("1/2/3")
=> nil
This function returns the two parts of the fraction, but returns nil
if it's a date (or if there's no fraction). It fails for "1/2/3 and 4/5"
but I don't know if you want (or need) that to pass. In any case, I recommend that, in the future, when you ask on Stack Overflow, "How do I make a regex to match this?" you should step back first and see if you can do it using a regex and a little extra. Regular expressions are a great tool and can do a lot, but they don't always need to be used alone.
EDIT 2:
I figured out how to solve the problem without resorting to non-regex code, and updated the regex. It should work as expected now, though I haven't tested it. I also went ahead and escaped the /
s since you're going to have to do it anyway.
EDIT 3:
I just fixed the bug j_random_hacker pointed out in my lookahead and lookbehind. I continue to see the amount of effort being put into this regex as proof that a pure regex solution was not necessarily the optimal solution to this problem.
Use negative lookahead and lookbehind.
/(?<![\/\d])(?:\d+)\/(?:\d+)(?![\/\d])/
EDIT: I've fixed my answer to trap for the backtracking bug identified by @j_random_hacker. As proof, I offer the following quick and dirty php script:
<?php
$subject = "The match should include 1/2 but not 12/34/56 but 11/23, now that's ok.";
$matches = array();
preg_match_all('/(?<![\/\d])(?:\d+)\/(?:\d+)(?![\/\d])/', $subject, $matches);
var_dump($matches);
?>
which outputs:
array(1) {
[0]=>
array(2) {
[0]=>
string(3) "1/2"
[1]=>
string(5) "11/23"
}
}
Lookahead is great if you're using Perl or PCRE, but if they are unavailable in the regex engine you're using, you can use:
(^|[^/\d])(\d+)/(\d+)($|[^/\d])
The 2nd and 3rd captured segments will be the numerator and denominator.
If you do use the above in a Perl regex, remember to escape the /
s -- or use a different delimiter, e.g.:
m!(?:^|[^/])(\d+)/(\d+)(?:$|[^/])!
In this case, you can use (?:...)
to avoid saving the uninteresting parenthesised parts.
EDIT 18/12/2009: Chris Lutz noticed a tricky bug caused by backtracking that plagues most of these answers -- I believe this is now fixed in mine.
if its line input you can try
^(\d+)\/(\d+)$
otherwise use this perhaps
^(\d+)\/(\d+)[^\\]*.
this will work: (?<![/]{1})\d+/\d+(?![/]{1})
Depending on the language you're working with you might try negative-look-ahead or look-behind assertions: in perl (?!pattern) asserts that /pattern/ can't follow the matched string.
Or, again, depending on the language, and anything you know about the context, a word-boundary match (\b in perl) might be appropriate.
精彩评论