开发者

Javascript lookahead regular expression

I'm trying to write a regular expression to parse the following string out into three distinct parts. This is for a highlighting engine I'm writing:

"\nOn and available after solution."

I have a regular expression that's dynamically created for any word a user might input. In the above example, the word is "on".

The regular expression expects a word with any amount of white space ([\s]*) followed by the search word (with no -\w following it, eg: on-time, on-wards should not be a valid result. To complicate this, there can be a -,$,< or > symbol following the example, so on-, on> or on$ are valid. This is why there is a negative lookahead after the search word in my regular expression below.

There's a complicated开发者_如何学运维 reason for this, but it's not relevant to the question. The last part should be the rest of the sentence. In this example, " and available after solution."

So,

p1 = "\n"

p2 = "On"

p3 = " and available after solution"

I currently have the following regular expression.

test = new RegExp('([\\s]*)(on(?!\\-\\w))([$\\-><]*?\\s(?=[.]*))',"gi")

The first part of this regular expression ([\\s]*)(on(?!\\-\\w))[$\\-><]*? works as expected. The last part does not.

In the last part, what I'm trying to do is force the regular expression engine to match whitespace before matching additional characters. If it can not match a space, then the regular expression should end. However, when I run this regular expression, I get the following results

str1 = "\nOn ly available after solution."

test.exec(str1)

["\n On ", "\n ", "On"]

So it would appear to me that the last positive look ahead is not working. Thanks for any suggestions, and if anyone needs some clarification, let me know.

EDIT:

It would appear that my regular expression was not matching because I didn't realize the following caveat:

You can use any regular expression inside the lookahead. (Note that this is not the case with lookbehind. I will explain why below.) Any valid regular expression can be used inside the lookahead. If it contains capturing parentheses, the backreferences will be saved. Note that the lookahead itself does not create a backreference. So it is not included in the count towards numbering the backreferences. If you want to store the match of the regex inside a backreference, you have to put capturing parentheses around the regex inside the lookahead, like this: (?=(regex)). The other way around will not work, because the lookahead will already have discarded the regex match by the time the backreference is to be saved.


  1. The dot in the character class [.] means a literal dot. Change it to just . if you wish to match any character.
  2. The lookahead (?=.*) will always match and is completely pointless. Change it to (.*) if you just want to capture that part of the string.


I think the problem is your positive lookahead on(?!\-\w) is trying to match any on that is not followed by - then \w. I think what you want instead is on(?!\-|\w), which matches on that is not followed by - OR \w

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜