Search backward through a string using a regex (in Python)?
Context
I'm parsing some code and want to match the doxygen comments before a function. However, because I want to match for a specific function name, getting only the immediately previous comment is giving me problems.Current Approach
import re
function_re = re.compile(
r"\/\*\*(.+)\*\/\s*void\s+(\w+)\s*::\s*function_name\s*\(\s*\)\s*")
function_match = function_re.search(file_string)
if function_match:
开发者_C百科 function_doc_str = update_match.group(2)
Problem with Current Approach
The current approach matches doxygen from earlier functions, giving me a result that is the wrong doxygen comment.Question
Is there a way to search backward through a string using the Python Regex library? It seems like my problem is that the more restrictive (less frequently occurring part) is the function signature, "void function()"Possible better question
Is there a better (easier) approach that I'm missing?simplest way is to just use a group, you don't need to go backwards...
(commentRegex)functionRegex
Then just extract group 1. You will need to run in multi-line mode to get it working, i don't know python so i can't be more helpful.
It's also possible with lookahead assertions, but this way is simpler.
I think you should use a regex that only matches doxymentation that's immediately before the function. Maybe something like this (simplified example):
import re
test = """
/**
@doxygen comment
*/
void function()
{
}
"""
doxygenRegex = r"(?P<comment>/\*\*(?:[^/]|/(?!\*\*))*\*/)"
functionRegex = r"(?P<function>\s\w+\s+(?P<functionName>\w+)\s*\()"
match = re.search(doxygenRegex + functionRegex, test)
print match.groupdict()
As long as this matches something, you can loop the regex matching - but starting the search at test[match.end():]
next time. Hope that makes sense to you...
BTW if you only want to extract the comment and nothing about the function, you can use lookahead - just replace functionRegex
with r"(?=\s\w+\s+\w+\s*\()"
.
This can be achived using a single reg-ex.
The key is to capture the comment just before the desired function.
The easy way to do this is to use non-greedy qualifier.
For example: /\*\*(.*?)\*/
with MULTILINE flag;
however, in Python, non-greedy and MULTILINE do not work together (at least on my environment).
So, you need a little trick like this:
/\*\*((?:[^\*]|\*(?!/))*)\*/
.
This is to match:
1: the comment begin /**
.
2: everything that is not *
OR *
that does not follows by /
3: the comment end */
.
From this idea the code you want is:
function_name = "function2"
regex_comment = "/\*\*((?:[^\*]|\*(?!/))*)\*/"
regex_static = "(?:(\w+)\s*::\s*)?"
regex_function = "(\w+)\s+"+regex_static+"(?:"+function_name+")\s*\([^\)]*\)"
regex = re.compile(regex_comment+"\s*"+regex_function, re.MULTILINE)
text = """
/**
@doxygen comment1
*/
void test::function1()
{
}
/**
@doxygen comment2
*/
void test::function2()
{
}
"""
match = regex.search(text)
if (match == None): print "None"
else: print match.group(1)
When run, you got:
@doxygen comment2
Variation:
If you want to capture /**
and */
too, use regex_comment = "(/\*\*(?:[^\*]|\*(?!/))*\*/)"
.
Hope this helps.
Note that C isn't a regular language, so it cannot be parsed by regular expressions. Have you considered leveraging doxygen itself to parse this file?
You can do look-behind assertions with (?<=...)
or (?<!...)
, but in general you can only match forwards.
The question is why are these comments not inside the function, so you can use doc.
But there is no easy way with regex.
here's a non regex approach, split on */
and find if the function you are looking for is at the next item. eg
test = """
/**
@doxygen comment
*/
void function()
{
}
"""
t=test.split("*/")
for n,comm in enumerate(t):
try:
if "void" in t[n+1]:
print t[n]
except IndexError: pass
精彩评论