PHP Regular expression help to work with Browser Agent String
I'm currently trying to learn regular expressions with some simple "real world" examples.
Take in consideration the following string:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko
I want to find the RV value (1.9.2a1pre). I need to apply the following rules:
- RV: can be in any case (RV, rv, rV, Rv...).
- RV: can be anywhere in the string.
- The RV: value ends with either a closing p开发者_开发问答arenthesis, any whitespace (including linebreak), a semicolon or the end of string.
So far I did:
/rv:[.][\)]?/i
but it's not working (I must be far from the "true" solution)...
The expression must work with PHP preg_match.
/rv\s*:\s*([^;)\s]+)/i
will match rv
, followed by a :
(which may be surrounded with whitespace), then a run of characters other than ;
, )
and whitespace (including newlines). The match result (after rv:
) will be captured in backreference no. 1.
Here is my revision to allow the RV sub-string to be anywhere
/rv:[\s]*([^); ]+)/i
()
denotes the capture group (ie, what you want to get back from this process)[^); ]
means characters that are not)
,*space*
or;
+
means one or more times*
means as many as you like, 0-many.[\s]*
just before the parenthesis basically means we chop off any leading whitespace from the match, essential in this case because we're explicitly saying we break the main match on a space.
So this is looking to capture a string of chars excluding )
one or more chars in length, immediately after rv:
.
Your version /rv:[.][\)]?/i
looks for a single .
then optionally a )
.
I think the [.]
means a dot, not "any character" ...
use this instead:
/rv:.+[\)]?/i
try this...
$str = 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko';
preg_match('/rv:([^\)]*)/i', $str , $matches);
echo $matches[1];
may be :
/rv:([^); \n]+)/i
that means NO ) ; space line-feed one or more time case insensitive and captured
I think what you want is this:
(?<=rv:).*(?=\))
everything within parentheses is a group. this ?<=
is called a positive lookbehind. it basically matches a string before the string you want. this ?=
is called a positive lookahead and matches a string after the string you want. since the string you want is simply numbers, letters and a decimal or two, the .
operator works as a catchall and matches any character except line breaks. *
indicates one or more of the previous characters.
hope that helps
$str = 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko';
preg_match('/rv:([a-z0-9\.])*/im', $str , $matches);
echo $matches[1];
精彩评论