开发者

Regex filter for STL messages

Given the follow STL error:

./poly_power/poly_class.cpp:496: error: no matching function for call to ‘state_operator< polynomial< variable_term< polynomial< variable_term< std::basic_string<char, std::char_traits< char>, std::allocator< char> >, int> >, polynomial< variable_term<std::basic_string< char, std::char_traits< char>, std::allocator<char> >, int> > > >, polynomial<variable_term< polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::all开发者_开发知识库ocator< char> >, int> >, polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> > > > >::operation( std::pair< const std::basic_string< char, std::char_traits< char>, std::allocator< char> >, state_vector_term< polynomial< variable_term< polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> >, polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> > > >, polynomial< variable_term< polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> >, polynomial< variable_term< std::basic_string< char, std::char_traits< char>, std::allocator< char> >, int> > > > > >&)’

how can I use a reg-ex expression to simplify it to something that looks like:

./poly_power/poly_class.cpp:496: error: no matching function for call to ‘state_operator<...>::operation( std::pair<...>&)’

ie. Convert everything within the outer pair of <> into .... I am aware of STLFilt, a perl script that does something similar, but I think it would be pedagogic to see how this could be done in pure regex.

Bonus

Paraitimze the expression so that it works on the nth level of <>. The 1st level would be example above, while the 2nd level would show something like state_operator<polynomial<...>, polynomial<...> >.


What you have here is an irregular grammar that is near-impossible to match with a regular expression.

Edit:

I have experimented around a bit, and, well, it is possible (provided that angle brackets occur nowhere else in the string, that they are balanced (one closing bracket for each opening one), and that you can define an upper limit to the nesting involved):

Match <...> (without nesting):

<[^<>]*+>

Match <...<...>...<...>...> (one level of nesting):

<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+[^<>]*+>

Match with up to two levels of nesting:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

Up to three levels:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

Up to four levels:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

Up to five levels:

<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+(?:<[^<>]*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+>[^<>]*+)*+[^<>]*+>

etc.

To get the bonus point, here's the last one broken up into a verbose regex, so you can see a bit easier how it is constructed:

<             # Match the first opening <
[^<>]*+       # Match any non-<>-characters possessively
  (?:         # Match the following zero or more times:
  <             # Match a <
  [^<>]*+       # etc. etc. etc.
    (?:
    <
    [^<>]*+
      (?:
      <
      [^<>]*+
        (?:
        <
        [^<>]*+
          (?:          # innermost level:
          <            # Match a <
          [^<>]*+      # Match any non-<> characters
          >            # Match a >
          [^<>]*+      # Match any non-<> characters
          )*+          # any number of times, possessively
        >            # then back one level: Match a >
        [^<>]*+      # etc. etc. etc.
        )*+
      >
      [^<>]*+
      )*+
    >
    [^<>]*+
    )*+
  >
  [^<>]*+
  )*+
[^<>]*+
>               # Match the final closing >

Isn't this perfectly horrible?

For your example, it seems you need seven levels of nesting. If you want a quick regex constructor, it could look like this (in Python):

def innerregex(nesting):
    if nesting == 0:
        return ""
    else:
        return "(?:<[^<>]*+" + innerregex(nesting-1) + ">[^<>]*+)*+"

def makeregex(nesting):
    return "<[^<>]*+" + innerregex(nesting) + ">"

makeregex(2) will return a regex that can match <<<><>>> correctly. makeregex(7) should work on the entire string, makeregex(6) will only take the innermost matches, and so on.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜