开发者

Remove unmatched parentheses from a string

I want to remove "un-partnered" parentheses from a string.

I.e., all ('s should be removed unless they're followed by a ) somewhere in the string. Likewise, all )'s not preceded by a ( somewhere in the string should be removed.

Ideally the algorithm would take into account nesting as well.

E.g.:

"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parent开发者_如何学Pythons # => "a"
")a(".remove_unmatched_parents # => "a"


Instead of a regex, consider a push-down automata, perhaps. (I'm not sure if Ruby regular expressions can handle this, I believe Perl's can).

A (very trivialized) process may be:

For each character in the input string:

  1. If it is not a '(' or ')' then just append it to the output
  2. If it is a '(' increase a seen_parens counter and add it
  3. If it is a ')' and seen_parens is > 0, add it and decrease seen_parens. Otherwise skip it.

At the end of the process, if seen_parens is > 0 then remove that many parens, starting from the end. (This step can be merged into the above process with use of a stack or recursion.)

The entire process is O(n), even if a relatively high overhead

Happy coding.


The following uses oniguruma. Oniguruma is the regex engine built in if you are using ruby1.9. If you are using ruby1.8, see this: oniguruma.

Update

I had been so lazy to just copy and paste someone else's regex. It seemed to have problem.

So now, I wrote my own. I believe it should work now.

class String
    NonParenChar = /[^\(\)]/
    def remove_unmatched_parens
        self[/
            (?:
                (?<balanced>
                    \(
                        (?:\g<balanced>|#{NonParenChar})*
                    \)
                )
                |#{NonParenChar}
            )+
        /x]
    end
end
  • (?<name>regex1) names the (sub)regex regex1 as name, and makes it possible to be called.
  • ?g<name> will be a subregex that represents regex1. Note here that ?g<name> does not represent a particular string that matched regex1, but it represents regex1 itself. In fact, it is possible to embed ?g<name> within (?<name>...).

Update 2

This is simpler.

class String
    def remove_unmatched_parens
        self[/
            (?<valid>
                \(\g<valid>*\)
                |[^()]
            )+
        /x]
    end
end


Build a simple LR parser:

tokenize, token, stack = false, "", []

")(a))(()(asdf)(".each_char do |c|
  case c
  when '('
    tokenize = true
    token = c
  when ')'
    if tokenize
      token << c 
      stack << token
    end
    tokenize = false
  when /\w/
    token << c if tokenize
  end
end

result = stack.join

puts result

running yields:

wesbailey@feynman:~/code_katas> ruby test.rb
(a)()(asdf)

I don't agree with the folks modifying the String class because you should never open a standard class. Regexs are pretty brittle for parser and hard to support. I couldn't imagine coming back to the previous solutions 6 months for now and trying to remember what they were doing!


Here's my solution, based on @pst's algorithm:

class String
  def remove_unmatched_parens
    scanner = StringScanner.new(dup)
    output = ''
    paren_depth = 0

    while char = scanner.get_byte
      if char == "("
        paren_depth += 1
        output << char
      elsif char == ")"
        output << char and paren_depth -= 1 if paren_depth > 0
      else
        output << char
      end
    end

    paren_depth.times{ output.reverse!.sub!('(', '').reverse! }
    output
  end
end


Algorithm:

  1. Traverse through the given string.
  2. While doing that, keep track of "(" positions in a stack.
  3. If any ")" found, remove the top element from the stack.
    • If stack is empty, remove the ")" from the string.
  4. In the end, we can have positions of unmatched braces, if any.

Java code: Present @ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜