Match Parenthetical Expression With Regular Expressions
I am working on a math expression parser using regular expressions and I am trying to add support for parentheses.
My parser works like this:
function parse_expression(expression){
Find parenthetical expressions
Loop through parenthetical expressions, call parse_expression() on all of them
Replace parenthetical expression with value of expression
Find value of expression
Return value
}
Because it it recursive, I need to find only the outmost parenthetical expressions. For example if I was parsing the string "(5 + (4 + (3 / 4) + (3 * 2) + 2)) + (1 + 2开发者_高级运维)", I want to find the expressions "5 + (4 + (3 / 4) + (3 * 2) + 2)" and "1 + 2". How do you do this with Regular Expressions?
The regular expression I have now ( "\(([^\)]+)\)" ) would return just "5 + ( 4 + ( 3 * 2", it doesn't get the full first expression and it gets none of the second.
Any ideas?
Thanks,
Kyle
Unfortunately, the language of arbitrarily nesting parenthesis is not regular and can therefore not be matched using a regular expression.
Specifically, a regular language is one that can be parsed using a finite automata, which has a (set) finite number of states. To match an arbitrarily-nested set of parentheses requires an arbitrary number of states, to count the parentheses as they go past.
Most "regular expression" libraries (especially perl's) don't strictly match a regular language, but they still have this restriction.
The most straightforward way to solve your problem is a recursive descent parser. An inefficient method is to just look through the string, counting parentheses as you go, to find which sub-strings to descend into.
You will also find your parser to be simpler if you insist that operations are parenthesised, for example only allowing (1+2)+3 or 1+(2+3) rather than 1+2+3.
Since you're iterating through it all, I'd say you should still do that, but go the other way around. Find the smallest subsets of paranthetical expressions, rather than the largest ones:
(\([^(]+\))
Evaluate them, and replace them with their values, i.e., first time round, the matches will be (3 / 4)
, (3 * 2)
and (1 + 2)
. Replace these with 0,75
, 6
and 3
, respectively, giving a new string:
(5 + (4 + 0,75 + 6 + 2)) + 3
And then you iterate that, until there are no more parenthetical expressions, working bottom-up rather than top-down (just like you would manually solve a task like this!)
Other than that, I agree with all others that exactly what you were asking for should not (indeed could not) be done with regular expressions. But your problem could be solved with this solution that involves regular expressions.
If I'm not mistaken, this language is not regular, so it is a theoretical impossibility to do this with regular expressions.
You should be using a parser. Have it parser traverse the string, and increment the parentheses count each time it encounters a (, and decrement the count each time it hits a ). when it next hits zero count, you have the range of your outermost parenthetical expression.
精彩评论