regular expression - function body extracting
in Python script, for every method definition in some C++ code of the form:
return_value ClassName::MethodName(args)
{MehodBody}
I need to extract three parts: the class name, the method name and the method body for further processing. Finding and 开发者_开发技巧extracting the ClassName and MethodName is easy, but is there any simple way to extract the body of the method? With all possible '{'
and '}'
inside it? Or are regexes unsuitable for such task?
>>> s = """return_value ClassName::MethodName(args)
{MehodBody {} } """
>>> re.findall(r'\b(\w+)::(\w+)\([^{]+\{(.+)}', s, re.S)
[('ClassName', 'MethodName', 'MehodBody {} ')]
I would recommend that you use the parser module rather than regexps since it will handle things like multiple line functions, different indentations and will abort on malformed input so that you can manage things better. "Avoid regexps if you can" is one of the rules I live by since they're often more trouble that they're worth.
Edit: Oh okay. I misread your question. I thought you wanted to parse Python code itself. I googled a little bit and found this but it's C only. Perhaps you can extend that? The grammar for C++ is there in the "C++ programming language book"
精彩评论