Python regular expression not matching end of line
I'm trying to match a C/C++ function definition using a fairly complex regular expression. I've found a case where it's not working and I'm trying to understand why. Here is the input string which does not match:
void Dump(const char * itemName, ofstream & os)
which clearly is a valid C++ method declaration. Here is the RE:
^[^=+-|#]*?([\w<>]+\s+(?!if|for|开发者_如何学编程switch|while|catch|return)\w+)\s*\([^;=+-|]*$
This basically tries to distinguish between other C syntax which looks like a method declaration, i.e. which has words followed by paraentheses.
Using the very useful Python regular expression debugger (http://www.pythonregex.com/) I've narrowed it down to the trailing "$" - if I remove the trailing $ in the regular expression, it matches the method signature above; if I leave in the $, it doesn't. There must be some idiosyncracy of Python RE's that is eluding me here. Thanks.
The use of +-|
in your character class [^;=+-|]
is a range specification. This will result in the character class containing (actually excluding since you're using ^
) much more than you intend. To specify a literal -
in a character class, mention it first like [^-;=+|]
.
The output of PythonRegex is somewhat misleading. The results of r.groups()
and r.findall()
are both the same: u'void Dump'
, which is the content of the first capturing group. If it showed the whole match, you'd see that when remove the $
you're only matching
void Dump(
...not the whole function definition as you intended. The reason for that (as Greg explained) is a syntax error in your last character class. You need to escape the hyphen by listing it first ([^-;=+|]
) or last ([^;=+|-]
), or by adding a backslash ([^;=+\-|]
).
The only way I can see to get PythonRegex to show the whole match is by removing all capturing groups (or converting them to non-capturing).
精彩评论