Using optional backreferences to capture string optionally enclosed with quotes
I'm trying to build a parser that would be able to extract the data using regex.
I want to be able to match
Here is what I have right now:
(\w+)\s+('|")([^\2\\]*(\\.[^\2\\]*)*)\2\s*;
The ([^\2\\]*(\\.[^\2\\]*)*)
part was taken from http://ad.hominem.org/log/2005/05/quoted_strings.php
Unfortunately I have two problems with this pattern.
First of all, I would like to be able to capture string which aren't enclosed with single/double quotes.
Having print "hello world";
works but print foobar;
doesnt't work. I haven't been able to make the backreference \2
optional at the end.
Furthermore, I don't know if it's just the way I enclosed the regex, but I can't seem to be able to parse multiple instance of this pattern.
If i try the regex with print 'hello'; print 'foobar';
, it would just return the first print 'hello';
part.
Thanks in advance for your help.
Edit
Here is a snippet of what I'm trying to parse:
listen 80开发者_高级运维;
server_name domain.com *.domain.com;
rewrite ^ http://www.domain.com$request_uri? permanent;
I am trying to capture every action with their parameters. Basically I wan't to be able to parse the NGINX configuration file: http://wiki.nginx.org/FullExample
A backreference doesn't work in a character class [^\2]
like that. It might be a multi-character string, and cannot be used there. You could work around that using a ((?!\2).)*
construct. But it would really be simpler if you just simplified your match pattern.
The easiest approach here would be to list the three possible alternatives separately:
/(\w+)\s+ (?: '([^']*)' | "([^"]*)" | (\S+) ) \s*;/x
Obviously you would then have to fetch the results from the result sets [2], [3] or [4] manually.
If you want to match multiple times use preg_match_all
instead. So long as the matching strings don't overlap you'll get all of them.
精彩评论