开发者

Search backward through a string using a regex (in Python)?

Context

I'm parsing some code and want to match the doxygen comments before a function. However, because I want to match for a specific function name, getting only the immediately previous comment is giving me problems.

Current Approach

import re  
function_re = re.compile(
    r"\/\*\*(.+)\*\/\s*void\s+(\w+)\s*::\s*function_name\s*\(\s*\)\s*")  
function_match = function_re.search(file_string)
if function_match:  
开发者_C百科    function_doc_str = update_match.group(2)

Problem with Current Approach

The current approach matches doxygen from earlier functions, giving me a result that is the wrong doxygen comment.

Question

Is there a way to search backward through a string using the Python Regex library?

It seems like my problem is that the more restrictive (less frequently occurring part) is the function signature, "void function()"

Possible better question

Is there a better (easier) approach that I'm missing?


simplest way is to just use a group, you don't need to go backwards...

 (commentRegex)functionRegex

Then just extract group 1. You will need to run in multi-line mode to get it working, i don't know python so i can't be more helpful.

It's also possible with lookahead assertions, but this way is simpler.


I think you should use a regex that only matches doxymentation that's immediately before the function. Maybe something like this (simplified example):

import re

test = """

/**
    @doxygen comment
*/
void function()
{
}

"""

doxygenRegex = r"(?P<comment>/\*\*(?:[^/]|/(?!\*\*))*\*/)"
functionRegex = r"(?P<function>\s\w+\s+(?P<functionName>\w+)\s*\()"

match = re.search(doxygenRegex + functionRegex, test)
print match.groupdict()

As long as this matches something, you can loop the regex matching - but starting the search at test[match.end():] next time. Hope that makes sense to you...

BTW if you only want to extract the comment and nothing about the function, you can use lookahead - just replace functionRegex with r"(?=\s\w+\s+\w+\s*\()".


This can be achived using a single reg-ex.

The key is to capture the comment just before the desired function. The easy way to do this is to use non-greedy qualifier. For example: /\*\*(.*?)\*/ with MULTILINE flag; however, in Python, non-greedy and MULTILINE do not work together (at least on my environment). So, you need a little trick like this:

/\*\*((?:[^\*]|\*(?!/))*)\*/.

This is to match:

1: the comment begin /**.

2: everything that is not * OR * that does not follows by /

3: the comment end */.

From this idea the code you want is:

function_name  = "function2"
regex_comment  = "/\*\*((?:[^\*]|\*(?!/))*)\*/"
regex_static   = "(?:(\w+)\s*::\s*)?"
regex_function = "(\w+)\s+"+regex_static+"(?:"+function_name+")\s*\([^\)]*\)"
regex = re.compile(regex_comment+"\s*"+regex_function, re.MULTILINE)
text  = """
/**
    @doxygen comment1
*/
void test::function1()
{
}

/**
    @doxygen comment2
*/
void test::function2()
{
}
"""
match = regex.search(text)
if (match == None): print "None"
else:               print match.group(1)

When run, you got:


    @doxygen comment2

Variation: If you want to capture /** and */ too, use regex_comment = "(/\*\*(?:[^\*]|\*(?!/))*\*/)".

Hope this helps.


Note that C isn't a regular language, so it cannot be parsed by regular expressions. Have you considered leveraging doxygen itself to parse this file?


You can do look-behind assertions with (?<=...) or (?<!...), but in general you can only match forwards.


The question is why are these comments not inside the function, so you can use doc.

But there is no easy way with regex.


here's a non regex approach, split on */ and find if the function you are looking for is at the next item. eg

test = """

/**
    @doxygen comment
*/
void function()
{
}

"""

t=test.split("*/")
for n,comm in enumerate(t):
    try:
        if "void" in t[n+1]:
             print t[n]
    except IndexError: pass
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜