开发者

Python regular expression not matching end of line

I'm trying to match a C/C++ function definition using a fairly complex regular expression. I've found a case where it's not working and I'm trying to understand why. Here is the input string which does not match:

   void Dump(const char * itemName, ofstream & os)

which clearly is a valid C++ method declaration. Here is the RE:

   ^[^=+-|#]*?([\w<>]+\s+(?!if|for|开发者_如何学编程switch|while|catch|return)\w+)\s*\([^;=+-|]*$

This basically tries to distinguish between other C syntax which looks like a method declaration, i.e. which has words followed by paraentheses.

Using the very useful Python regular expression debugger (http://www.pythonregex.com/) I've narrowed it down to the trailing "$" - if I remove the trailing $ in the regular expression, it matches the method signature above; if I leave in the $, it doesn't. There must be some idiosyncracy of Python RE's that is eluding me here. Thanks.


The use of +-| in your character class [^;=+-|] is a range specification. This will result in the character class containing (actually excluding since you're using ^) much more than you intend. To specify a literal - in a character class, mention it first like [^-;=+|].


The output of PythonRegex is somewhat misleading. The results of r.groups() and r.findall() are both the same: u'void Dump', which is the content of the first capturing group. If it showed the whole match, you'd see that when remove the $ you're only matching

void Dump(

...not the whole function definition as you intended. The reason for that (as Greg explained) is a syntax error in your last character class. You need to escape the hyphen by listing it first ([^-;=+|]) or last ([^;=+|-]), or by adding a backslash ([^;=+\-|]).

The only way I can see to get PythonRegex to show the whole match is by removing all capturing groups (or converting them to non-capturing).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜