Removing unnecessary parentheses in a regular expression
Suppose I have (in a javascript regular expression)
((((A)B)C)D)
Of course that really reads
ABCD
开发者_如何学Python
Is there an algorithm to eliminate unnecessary parentheses in a string like that?
This function will remove all groups that is not followed by a quantifier, and is not a look-around. It assumes ECMAScript flavor regex, and that capture-groups ((
... )
) are unimportant.
function removeUnnecessaryParenthesis(s) {
// Tokenize the pattern
var pieces = s.split(/(\\.|\[(?:\\.|[^\]\\])+]|\((?:\?[:!=])?|\)(?:[*?+]\??|\{\d+,?\d*}\??)?)/g);
var stack = [];
for (var i = 0; i < pieces.length; i++) {
if (pieces[i].substr(0,1) == "(") {
// Opening parenthesis
stack.push(i);
} else if (pieces[i].substr(0,1) == ")") {
// Closing parenthesis
if (stack.length == 0) {
// Unbalanced; Just skip the next one.
continue;
}
var j = stack.pop();
if ((pieces[j] == "(" || pieces[j] == "(?:") && pieces[i] == ")") {
// If it is a capturing group, or a non-capturing group, and is
// not followed by a quantifier;
// Clear both the opening and closing pieces.
pieces[i] = "";
pieces[j] = "";
}
}
}
return pieces.join("");
}
Examples:
removeUnnecessaryParenthesis("((((A)B)C)D)") --> "ABCD"
removeUnnecessaryParenthesis("((((A)?B)C)D)") --> "(A)?BCD"
removeUnnecessaryParenthesis("((((A)B)?C)D)") --> "(AB)?CD"
It does not try to determine if the parenthesis contains only a single token ((A)?
). That would require a longer tokenizing pattern.
1) Use a parser that understands parenthesis
2) Use a Perl recursive regex that can match parenthesis (discouraged in this case IMHO) I don't think Boost regex's support the type of recursion needed.
3) Perhaps they are needed? Leave them alone.
精彩评论