How to get multiple occurrence in a regular expression when a pattern contains the same pattern within itself?
I would like to get first occurrence in a regular expression, but not embedded ones.
For example, Regular Expression is:
\bTest\b\s*\(\s*\".*\"\s*,\s*\".*\"\s*\)
Sample text is
x == Test("123" , "ABC") || x == Test ("123" , "DEF")
Result:
Test("123" , "ABC") || x == Test ("123" , "DEF")
Using any regular expression tool (Expresso,开发者_C百科 for example), I am getting the whole text as the result, as it satisfies the regular expression. Is there a way to get the result in two parts as shown below.
Test("123" , "ABC")
and
Test ("123" , "DEF")
Are you trying to parse code with regex? This is always going to be a fairly brittle solution, and you should consider using an actual parser.
That said, to solve your immediate problem, you want to use non-greedy matching - the *?
quantifier instead of just the *
.
Like so:
\bTest\b\s*\(\s*\".*?\"\s*,\s*\".*?\"\s*\)
A poor mans C function parser, in Perl.
## ===============================================
## C_FunctionParser_v3.pl @ 3/21/09
## -------------------------------
## C/C++ Style Function Parser
## Idea - To parse out C/C++ style functions
## that have parenthetical closures (some don't).
## - sln
## ===============================================
my $VERSION = 3.0;
$|=1;
use strict;
use warnings;
# Prototype's
sub Find_Function(\$\@);
# File-scoped variables
my ($FxParse, $FName, $Preamble);
# Set function name, () gets all functions
SetFunctionName('Test'); # Test case, function 'Test'
## --------
# Source file
my $Source = join '', <DATA>;
# Extended, possibly non-compliant,
# function name - pattern examples:
# (no capture groups in function names strings or regex!)
# - - -
# SetFunctionName( qr/_T/ );
# SetFunctionName( qr/\(\s*void\s*\)\s*function/ );
# SetFunctionName( "\\(\\s*void\\s*\\)\\s*function" );
# Parse some functions
my @Funct = ();
Find_Function( $Source, @Funct );
# Print functions found
# (segments can be modified and/or collated)
if ( !@Funct ) {
print "Function name pattern: '$FName' not found!\n";
} else {
print "\nFound ".@Funct." matches.\nFunction pattern: '$FName' \n";
}
for my $ref (@Funct) {
# Format; @: Line number - function
printf "\n\@: %6d - %s\n", $$ref[3], substr($Source, $$ref[0], $$ref[2] - $$ref[0]);
}
exit;
## End
# ---------
# Set the parser's function regex pattern
#
sub SetFunctionName
{
if (!@_) {
$FName = "_*[a-zA-Z][\\w]*"; # Matches all compliant function names (default)
} else {
$FName = shift; # No capture groups in function names please
}
$Preamble = "\\s*\\(";
# Compile function parser regular expression
# Regex condensed:
# $FxParse = qr!//(?:[^\\]|\\\n?)*?\n|/\*.*?\*/|\\.|'["()]'|(")|($FName$Preamble)|(\()|(\))!s;
# | | | |1 1|2 2|3 3|4 4
# Note - Non-Captured, matching items, are meant to consume!
# -----------------------------------------------------------
# Regex /xpanded (with commentary):
$FxParse = # Regex Precedence (items MUST be in this order):
qr! # -----------------------------------------------
// # comment - //
(?: # grouping
[^\\] # any non-continuation character ^\
| # or
\\\n? # any continuation character followed by 0-1 newline \n
)*? # to be done 0-many times, stopping at the first end of comment
\n # end of comment - //
| /\*.*?\*/ # or, comment - /* + anything + */
| \\. # or, escaped char - backslash + ANY character
| '["()]' # or, single quote char - quote then one of ", (, or ), then quote
| (") # or, capture $1 - double quote as a flag
| ($FName$Preamble) # or, capture $2 - $FName + $Preamble
| (\() # or, capture $3 - ( as a flag
| (\)) # or, capture $4 - ) as a flag
!xs;
}
# Procedure that finds C/C++ style functions
# (the engine)
# Notes:
# - This is not a syntax checker !!!
# - Nested functions index and closure are cached. The search is single pass.
# - Parenthetical closures are determined via cached counter.
# - This precedence avoids all ambigous paranthetical open/close conditions:
# 1. Dual comment styles.
# 2. Escapes.
# 3. Single quoted characters.
# 4. Double quotes, fip-flopped to determine closure.
# - Improper closures are reported, with the last one reliably being the likely culprit
# (this would be a syntax error, ie: the code won't complie, but it is reported as a closure error).
#
sub Find_Function(\$\@)
{
my ($src, $Funct) = @_;
my @Ndx = ();
my @Closure = ();
my ($Lines, $offset, $closure, $dquotes) = (1,0,0,0);
while ($$src =~ /$FxParse/xg)
{
if (defined $1) # double quote "
{
$dquotes = !$dquotes;
}
next if ($dquotes);
if (defined $2) # 'function name'
{
# ------------------------------------
# Placeholder for exclusions......
# ------------------------------------
# Cache the current function index and current closure
push @Ndx, scalar(@$Funct);
push @Closure, $closure;
my ($funcpos, $parampos) = ( $-[0], pos($$src) );
# Get newlines since last function
$Lines += substr ($$src, $offset, $funcpos - $offset) =~ tr/\n//;
# print $Lines,"\n";
# Save positions: function( parms )
push @$Funct , [$funcpos, $parampos, 0, $Lines];
# Asign new offset
$offset = $funcpos;
# Closure is now 1 because of preamble '('
$closure = 1;
}
elsif (defined $3) # '('
{
++$closure;
}
elsif (defined $4) # ')'
{
--$closure;
if ($closure <= 0)
{
$closure = 0;
if (@Ndx)
{
# Pop index and closure, store position
$$Funct[pop @Ndx][2] = pos($$src);
$closure = pop @Closure;
}
}
}
}
# To test an error, either take off the closure of a function in its source,
# or force it this way (pseudo error, make sure you have data in @$Funct):
# push @Ndx, 1;
# Its an error if index stack has elements.
# The last one reported is the likely culprit.
if (@Ndx)
{
## BAD, RETURN ...
## All elements in stack have to be fixed up
while ( @Ndx ) {
my $func_index = shift @Ndx;
my $ref = $$Funct[$func_index];
$$ref[2] = $$ref[1];
print STDERR "** Bad return, index = $func_index\n";
print "** Error! Unclosed function [$func_index], line ".
$$ref[3].": '".substr ($$src, $$ref[0], $$ref[2] - $$ref[0] )."'\n";
}
return 0;
}
return 1
}
__DATA__
x == Test("123" , "ABC") || x == Test ("123" , "DEF")
Test("123" , Test ("123" , "GHI"))?
Test("123" , "ABC(JKL)") || x == Test ("123" , "MNO")
Output (line # - function):
Found 6 matches.
Function pattern: 'Test'
@: 1 - Test("123" , "ABC")
@: 1 - Test ("123" , "DEF")
@: 2 - Test("123" , Test ("123" , "GHI"))
@: 2 - Test ("123" , "GHI")
@: 3 - Test("123" , "ABC(JKL)")
@: 3 - Test ("123" , "MNO")
精彩评论