开发者

What does this regular expression part add?

I came across this regular expression in the jQuery source code:

...
rmozilla = /(mozilla)(?:.*? rv:([\w.]+))?/,
...

I was wondering why it was rather complicated. I'm especially interested in the reason behind the second part:

(?:.*? rv:([\w.]+))?

I did some research but I could not figure out what this part of the regular expression adds.

(?:)      to match but not capture
.*?       any amount of any character
 rv:      something l开发者_如何学Pythoniteral
([\w.]+)  one or more word characters or a dot
?         appear 0 or 1 time

Particularly, that last ? doesn't make much sense to me. The whole second part matches if there is or is not a substring as defined by that second part. With some trial and error the regular expression does not seem to differ from just:

/(mozilla)/

Could someone shed some light on what the second part of the regular expression is supposed to do? What does it constrain; what string fails that passes /(mozilla)/ or the other way round?


The two regexes would match the same strings, but would store different information in their capturing groups.

for the string: mozilla asdf rv:sadf

/(mozilla)(?:.*? rv:([\w.]+))?/
$0 = 'mozilla asdf rv:sadf'
$1 = 'mozilla'
$2 = 'sadf'

/(mozilla)/
$0 = 'mozilla'
$1 = 'mozilla'
$2 = ''


Note: I now notice that this answer might be a bit out of scope. I will still leave it for further information, but if you think it is too much out of scope, just comment and I will remove it.


@arnaud is right, it is to get the version. Here is the code where the expressions is used:

uaMatch: function( ua ) {
    ua = ua.toLowerCase();

    var match = rwebkit.exec( ua ) ||
                ropera.exec( ua ) ||
                rmsie.exec( ua ) ||
                ua.indexOf("compatible") < 0 && rmozilla.exec( ua ) ||
                [];

    return { browser: match[1] || "", version: match[2] || "0" };
},

You can see that the function returns the version if found and 0 if not. This might be necessary for some browsers or is just provided as additional information for developers.

The function is called here:

browserMatch = jQuery.uaMatch( userAgent );
if ( browserMatch.browser ) {
    jQuery.browser[ browserMatch.browser ] = true;
    jQuery.browser.version = browserMatch.version;
}


First, I'd like to clarify the difference between:

.*? - non-greedy match
.* - greedy match

The non-greedy will match the smallest number of bytes possible (given the rest of the search string), and the greedy one will match the most.

Given the string:

mozilla some text here rv:abc xyz

The regex will return both 'mozilla' and 'abc'. But if the 'rv:' doesn't exist, the regex will still return 'mozilla'.


The ([\w.]+) inside of (?:.*? rv:([\w.]+)) is capturing, so maybe this regex was used to get the revision number in the past (however it seems that currently jquery only checks if the regex matches).


(pat) is a pattern delimiter for matching an full contained pattern. (?:pat) is the negation of above, just like the Character set bracket [^ ] is the negation of [ ]. In javascript the negation occurs with ! . matches any character, * is a quantifier of matches, and can in newer Regex Engines also written as {0,} (but those three additional characters may likely result in an earlier death of your keyboard!) ? redundant match quantifier: may match zero or one time rv: .... literal rv

another submatch, may match zero or one time within the parent match ([\w.]+))? [\w.]... character set, with escapted w "\w": any alphanumerical character, aka [a-zA-Z0-9_] followed by a literal dot, and per match quantifier +, may occur one or more times

To reverse engineer the meaning of the pattern match: just evaluate from left on right, in a text editor and substitute the letters by random literals that come to mind and for which each sub-expression matches. Then take a step back and ponder what the regex might have been for.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜