开发者

Browsers different interpretations of a regex with lookahead

I am running a split in javascript with /\s+(AND|OR)(?=\s+")\s+/ on

"email" IS NOT NULL AND "email" LIKE '%gmail.com' OR "email" = 'test@test.com'

Now, my understanding of regular expressions would lead me to expect obtaining the following array:

[0]: "email" IS NOT NULL
[1]: "email" LIKE '%gmail.com'
[2]: "email" = 'test@test.com'

Note: I got rid of the delimiters for clarity.

However, I obtain

[0]: "email" IS NOT NULL
[1]:  AND
[2]: "email" LIKE '%gmail.com'
[3]:  OR
[4]: "email" = 'test@test.com'

when running on Firefox 3.6.8, Chrome 5.0.375.126 and Safari 5.0.1 on OS X 10.6.4.

However, when I tried on an up to date IE8 8.0.6 with default settings and I obtain what I was expecting at first. PHP 5.2.10 with preg_split does also split it this way.

My guess is that for once the 'good' browsers got it wrong but I'd like more opinions.

Edit: The example I gave here with emails is a naive example. Basically I don't know what each member can be. "xyz" = '1' AND "zyx" = 'test AND toast' is another possible input string.

What I know of the structure is that the whole string will have the following pattern:开发者_运维知识库

"<attribute>" <operator> '<value>'( (AND|OR) "<attribute>" <operator> '<value>')*

Note: spaces actually represent \s+


Try splitting on /\b(?:AND|OR)\b/, and trim the resulting parts.

Be aware that boolean operators have precedence rules and you cannot just split on AND and OR without losing meaning. Also, boolean expressions can (in theory) be enclosed in nested parentheses, which basically rules out regular expressions as a technology to parse them.


This will return the result you want:

var string = "\"email\" IS NOT NULL AND \"email\" LIKE '%gmail.com' OR \"email\" = 'test@test.com'"
string.split(/\s+(?:AND|OR)\s+/)


It looks like Firefox and Chrome got it perfectly right, since according to the specs of ECMAScriptv5 section 15.5.4.14

If separator is a regular expression that contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array.

For example,

"A<B>bold</B>and<CODE>coded</CODE>".split(/<(\/)?([^<>]+)>/)

evaluates to the array

["A", undefined, "B", "bold", "/", "B", "and", undefined, "CODE", "coded", "/", "CODE", ""]

Pointer to the specs by Chris Leary of Mozilla.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜