开发者

Using regular expressions to extract functions and function headers from source code

I'm trying to extract functions and function headers from some source code files. Here's an example of the type of code:

################################################################################
# test module
#
# Description : Test module
#
DATABASE test

###
# Global Vars
GLOBALS
    DEFINE G_test_string    STRING
END GLOBALS

###
# Modular Vars
DEFINE M_counter            INTEGER

###
# Constants
CONSTANT MAX_ARR_SIZE = 100

##################################
# Alternative header
##################################
FUNCTION test_function_1()
    DEFINE  F_x     INTEGER

    LET F_x = 1

    RETURN F_x
END FUNCTION

###################################
# Function:
#   This is a test function
#
# Parameters:
#   in - test
#
# Returns:
#   out - result
#
FUNCTION test_function_2( P_in_var )
    DEFINE  P_in_var    INTEGER

    DEFINE  F_out_var   INTEGER


    LET F_out_var = P_in_var

    RETURN F_out_var
END FUNCTION

FUNCTION test_init_array()
    DEFINE  F_array     ARRAY[ MAX_ARR_SIZE ] OF INTEGER
    DEFINE  F_element   INTEGER

    FOR F_element = 1 TO MAX_ARR_SIZE

        LET F_array[ F_element ] = F_element * F_element

    END FOR

END FUNCTION

Functions may or may not have a header above them. I'm trying to capture the function source, function header, function name and any parameters passed into the function in groups. Here's the expression i came up with (i'm doing this using .Net regex and have been testing using Regex Hero):

^([#]{0,1}.*?)(FUNCTION\s+(.*?)[(](.*?)[)].*?END FUNCTION) 

This seems to work ok for all but the first function (test_function_1) in the file. The initial grouping for test_function_1 is capturing everything from the first line (the top of the source file) until the FUNCTION of test_function_1 begins. I realise this is because there are #s for other comments in the file, but i only want to capture the func开发者_运维百科tion header.


If I see it correctly, you have problems identifying lines starting with #. To achieve this, you could turn on the RegexOptions.Multiline flag and match the function header with

((?:^#.*\s)*)

Edit: For this to work, you'd have to switch OFF RegexOptions.Singleline and replace .*? with [\s\S]*? in your function body part.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜