开发者

PHP Regular expression help to work with Browser Agent String

I'm currently trying to learn regular expressions with some simple "real world" examples.

Take in consideration the following string:

Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko

I want to find the RV value (1.9.2a1pre). I need to apply the following rules:

  1. RV: can be in any case (RV, rv, rV, Rv...).
  2. RV: can be anywhere in the string.
  3. The RV: value ends with either a closing p开发者_开发问答arenthesis, any whitespace (including linebreak), a semicolon or the end of string.

So far I did:

/rv:[.][\)]?/i

but it's not working (I must be far from the "true" solution)...

The expression must work with PHP preg_match.


/rv\s*:\s*([^;)\s]+)/i

will match rv, followed by a : (which may be surrounded with whitespace), then a run of characters other than ;, ) and whitespace (including newlines). The match result (after rv:) will be captured in backreference no. 1.


Here is my revision to allow the RV sub-string to be anywhere

/rv:[\s]*([^); ]+)/i
  • () denotes the capture group (ie, what you want to get back from this process)
  • [^); ] means characters that are not ), *space* or ;
  • + means one or more times
  • * means as many as you like, 0-many.
  • [\s]* just before the parenthesis basically means we chop off any leading whitespace from the match, essential in this case because we're explicitly saying we break the main match on a space.

So this is looking to capture a string of chars excluding ) one or more chars in length, immediately after rv:.

Your version /rv:[.][\)]?/i looks for a single . then optionally a ).


I think the [.] means a dot, not "any character" ... use this instead:

/rv:.+[\)]?/i


try this...

$str = 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko';
preg_match('/rv:([^\)]*)/i', $str , $matches);
echo $matches[1];


may be :

/rv:([^); \n]+)/i

that means NO ) ; space line-feed one or more time case insensitive and captured


I think what you want is this:

(?<=rv:).*(?=\))

everything within parentheses is a group. this ?<= is called a positive lookbehind. it basically matches a string before the string you want. this ?= is called a positive lookahead and matches a string after the string you want. since the string you want is simply numbers, letters and a decimal or two, the . operator works as a catchall and matches any character except line breaks. * indicates one or more of the previous characters.

hope that helps


$str = 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko';
preg_match('/rv:([a-z0-9\.])*/im', $str , $matches);
echo $matches[1];
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜