Remove unmatched parentheses from a string

2023-02-19 08:25 问答作者：

I want to remove "un-partnered" parentheses from a string.

I.e., all ('s should be removed unless they're followed by a ) somewhere in the string. Likewise, all )'s not preceded by a ( somewhere in the string should be removed.

Ideally the algorithm would take into account nesting as well.

E.g.:

"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parent开发者_如何学Pythons # => "a"
")a(".remove_unmatched_parents # => "a"

Instead of a regex, consider a push-down automata, perhaps. (I'm not sure if Ruby regular expressions can handle this, I believe Perl's can).

A (very trivialized) process may be:

For each character in the input string:

If it is not a '(' or ')' then just append it to the output
If it is a '(' increase a seen_parens counter and add it
If it is a ')' and seen_parens is > 0, add it and decrease seen_parens. Otherwise skip it.

At the end of the process, if seen_parens is > 0 then remove that many parens, starting from the end. (This step can be merged into the above process with use of a stack or recursion.)

The entire process is O(n), even if a relatively high overhead

Happy coding.

The following uses oniguruma. Oniguruma is the regex engine built in if you are using ruby1.9. If you are using ruby1.8, see this: oniguruma.

Update

I had been so lazy to just copy and paste someone else's regex. It seemed to have problem.

So now, I wrote my own. I believe it should work now.

class String
    NonParenChar = /[^\(\)]/
    def remove_unmatched_parens
        self[/
            (?:
                (?<balanced>
                    \(
                        (?:\g<balanced>|#{NonParenChar})*
                    \)
                )
                |#{NonParenChar}
            )+
        /x]
    end
end

(?<name>regex1) names the (sub)regex regex1 as name, and makes it possible to be called.
?g<name> will be a subregex that represents regex1. Note here that ?g<name> does not represent a particular string that matched regex1, but it represents regex1 itself. In fact, it is possible to embed ?g<name> within (?<name>...).

Update 2

This is simpler.

class String
    def remove_unmatched_parens
        self[/
            (?<valid>
                \(\g<valid>*\)
                |[^()]
            )+
        /x]
    end
end

Build a simple LR parser:

tokenize, token, stack = false, "", []

")(a))(()(asdf)(".each_char do |c|
  case c
  when '('
    tokenize = true
    token = c
  when ')'
    if tokenize
      token << c 
      stack << token
    end
    tokenize = false
  when /\w/
    token << c if tokenize
  end
end

result = stack.join

puts result

running yields:

wesbailey@feynman:~/code_katas> ruby test.rb
(a)()(asdf)

I don't agree with the folks modifying the String class because you should never open a standard class. Regexs are pretty brittle for parser and hard to support. I couldn't imagine coming back to the previous solutions 6 months for now and trying to remember what they were doing!

Here's my solution, based on @pst's algorithm:

class String
  def remove_unmatched_parens
    scanner = StringScanner.new(dup)
    output = ''
    paren_depth = 0

    while char = scanner.get_byte
      if char == "("
        paren_depth += 1
        output << char
      elsif char == ")"
        output << char and paren_depth -= 1 if paren_depth > 0
      else
        output << char
      end
    end

    paren_depth.times{ output.reverse!.sub!('(', '').reverse! }
    output
  end
end

Algorithm:

Traverse through the given string.
While doing that, keep track of "(" positions in a stack.
If any ")" found, remove the top element from the stack.
- If stack is empty, remove the ")" from the string.
In the end, we can have positions of unmatched braces, if any.

Java code: Present @ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html

继续阅读：recursion regex ruby string text-processing

Remove unmatched parentheses from a string

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？