Using regular expressions to extract functions and function headers from source code
I'm trying to extract functions and function headers from some source code files. Here's an example of the type of code:
################################################################################
# test module
#
# Description : Test module
#
DATABASE test
###
# Global Vars
GLOBALS
DEFINE G_test_string STRING
END GLOBALS
###
# Modular Vars
DEFINE M_counter INTEGER
###
# Constants
CONSTANT MAX_ARR_SIZE = 100
##################################
# Alternative header
##################################
FUNCTION test_function_1()
DEFINE F_x INTEGER
LET F_x = 1
RETURN F_x
END FUNCTION
###################################
# Function:
# This is a test function
#
# Parameters:
# in - test
#
# Returns:
# out - result
#
FUNCTION test_function_2( P_in_var )
DEFINE P_in_var INTEGER
DEFINE F_out_var INTEGER
LET F_out_var = P_in_var
RETURN F_out_var
END FUNCTION
FUNCTION test_init_array()
DEFINE F_array ARRAY[ MAX_ARR_SIZE ] OF INTEGER
DEFINE F_element INTEGER
FOR F_element = 1 TO MAX_ARR_SIZE
LET F_array[ F_element ] = F_element * F_element
END FOR
END FUNCTION
Functions may or may not have a header above them. I'm trying to capture the function source, function header, function name and any parameters passed into the function in groups. Here's the expression i came up with (i'm doing this using .Net regex and have been testing using Regex Hero):
^([#]{0,1}.*?)(FUNCTION\s+(.*?)[(](.*?)[)].*?END FUNCTION)
This seems to work ok for all but the first function (test_function_1) in the file. The initial grouping for test_function_1 is capturing everything from the first line (the top of the source file) until the FUNCTION of test_function_1 begins. I realise this is because there are #s for other comments in the file, but i only want to capture the func开发者_运维百科tion header.
If I see it correctly, you have problems identifying lines starting with #.
To achieve this, you could turn on the RegexOptions.Multiline
flag and match the function header with
((?:^#.*\s)*)
Edit:
For this to work, you'd have to switch OFF RegexOptions.Singleline
and replace .*?
with [\s\S]*?
in your function body part.
精彩评论