开发者

Regexp Question - Negating a captured character

I'm looking for a regular expression that allows for either single-quoted or double-quoted strings, and allows the opposite quote character within the string. For example, the following would both be legal strings: "hello 'there' world" 'hello "there" world'

The regexp I'm using uses negative lookahead and is as follows:

(['"])(?:(?!\1).)*\1

This would work I think, but what about if the language didn't support negative lookahead. Is there any other way to do this? Without alternation?

EDIT:

开发者_运维知识库I know I can use alternation. This was more of just a hypothetical question. Say I had 20 different characters in the initial character class. I wouldn't want to write out 20 different alternations. I'm trying to actually negate the captured character, without using lookahead, lookbehind, or alternation.


This is actually much simpler than you may have realized. You don't really need the negative look-ahead. What you want to do is a non-greedy (or lazy) match like this:

(['"]).*?\1

The ? character after the .* is the important part. It says, consume the minimum possible characters before hitting the next part of the regex. So, you get either kind of quote, and then you go after 0-M characters until you encounter a character matching whichever quote you first ran into. You can learn more about greedy matching vs. non-greedy here and here.


Sure:

'([^']*)'|"([^"]*)"

On a successful match, the $+ variable will hold the contents of whichever alternate matched.


In the general case, regexps are not really the answer. You might be interested in something like Text::ParseWords, which tokenizes text, accounting for nested quotes, backslashed quotes, backslashed spaces, and other oddities.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜