Regex Help, How do I make order of expressions not matter?
I can't figure out how to get the order of the incoming string parameters (price,merchant,category) will not matter to the regex. My regex matches the parts of the string but not the string as a whole. I need to be able to add \A \Z to it.
Pattern:
(,?price:(;?(((\d+(\.\d+)?)|min)-((\d+(\.\d+)?)|max))|\d+)+){0,1}(,?merchant:\d+){0,1}(,?category:\d+){0,1}
Sample Strings:
price:1.00-max;3-12;23.34-12.19,category:3
merchant:25,price:1.00-max开发者_开发百科;3-12;23.34-12.19,category:3
price:1.00-max;3-12;23.34-12.19,category:3,merchant:25
category:3,price:1.00-max;3-12;23.34-12.19,merchant:25
Note: I'm going to add ?:
to all my groups after I get it working.
You should probably just parse this string through normal parsing. Split it at the commas, then split each of those pieces into two by the colons. You can store validation regexes if you'd like to check each of those inputs individually.
If you do it through regex, you'll probably have to end up saying "this combination OR this combination OR this combination", which will hurt real bad.
You have three options:
- You can enumerate all the possible orders. For 3 variables there are 6 possibilities. Obviously this doesn't scale;
- You can accept possible duplicates; or
- You can break the string up and then parse it.
(2) means something like:
/(\b(price|category|merchant)=(...).*?)*/
The real problem you're facing here is that you're trying to parse what is essentially a non-regular language with a regular expression. A regular expression describes a DFSM (deterministic finite state machine) or DFA (deterministic finite automaton). Regular languages have no concept of state so the expression can't "remember" what else there has been.
To get to that you have to add a "memory" usually in the form of a stack, which yields a PDA (pushdown automaton).
It's exactly the same problem people face when they try and parse HTML with regexes and get stuck on tag nesting issues and similar.
Basically you accept some edge conditions (like repeated values), split the string by comma and then parse or you're just using the wrong tool for the job.
How about don't try and do it all with one Cthulhugex?
/price:([^,]*)/
/merchant:([^,]*)/
/category:([^,]*)/
$string=<<<EOF
price:1.00-max;3-12;23.34-12.19,category:3
merchant:25,price:1.00-max;3-12;23.34-12.19,category:3
price:1.00-max;3-12;23.34-12.19,category:3,merchant:25
category:3,price:1.00-max;3-12;23.34-12.19,merchant:25
EOF;
$s = preg_replace("/\n+/",",",$string);
$s = explode(",",$s);
print_r($s);
output
$ php test.php
Array
(
[0] => price:1.00-max;3-12;23.34-12.19
[1] => category:3
[2] => merchant:25
[3] => price:1.00-max;3-12;23.34-12.19
[4] => category:3
[5] => price:1.00-max;3-12;23.34-12.19
[6] => category:3
[7] => merchant:25
[8] => category:3
[9] => price:1.00-max;3-12;23.34-12.19
[10] => merchant:25
)
精彩评论