开发者

Using optional backreferences to capture string optionally enclosed with quotes

I'm trying to build a parser that would be able to extract the data using regex.

I want to be able to match

Here is what I have right now:

(\w+)\s+('|")([^\2\\]*(\\.[^\2\\]*)*)\2\s*;

The ([^\2\\]*(\\.[^\2\\]*)*) part was taken from http://ad.hominem.org/log/2005/05/quoted_strings.php

Unfortunately I have two problems with this pattern.

First of all, I would like to be able to capture string which aren't enclosed with single/double quotes.

Having print "hello world"; works but print foobar; doesnt't work. I haven't been able to make the backreference \2 optional at the end.

Furthermore, I don't know if it's just the way I enclosed the regex, but I can't seem to be able to parse multiple instance of this pattern.

If i try the regex with print 'hello'; print 'foobar';, it would just return the first print 'hello'; part.

Thanks in advance for your help.

Edit

Here is a snippet of what I'm trying to parse:

listen          80开发者_高级运维;
server_name     domain.com *.domain.com;
rewrite ^       http://www.domain.com$request_uri? permanent;

I am trying to capture every action with their parameters. Basically I wan't to be able to parse the NGINX configuration file: http://wiki.nginx.org/FullExample


A backreference doesn't work in a character class [^\2] like that. It might be a multi-character string, and cannot be used there. You could work around that using a ((?!\2).)* construct. But it would really be simpler if you just simplified your match pattern.

The easiest approach here would be to list the three possible alternatives separately:

 /(\w+)\s+ (?: '([^']*)' |  "([^"]*)" | (\S+) ) \s*;/x

Obviously you would then have to fetch the results from the result sets [2], [3] or [4] manually.


If you want to match multiple times use preg_match_all instead. So long as the matching strings don't overlap you'll get all of them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜