开发者

How to get multiple occurrence in a regular expression when a pattern contains the same pattern within itself?

I would like to get first occurrence in a regular expression, but not embedded ones.

For example, Regular Expression is:

\bTest\b\s*\(\s*\".*\"\s*,\s*\".*\"\s*\) 

Sample text is

x == Test("123" ,  "ABC") || x == Test ("123" , "DEF")

Result:

Test("123" ,  "ABC") || x == Test ("123" , "DEF")

Using any regular expression tool (Expresso,开发者_C百科 for example), I am getting the whole text as the result, as it satisfies the regular expression. Is there a way to get the result in two parts as shown below.

Test("123" ,  "ABC") 

and

Test ("123" , "DEF")


Are you trying to parse code with regex? This is always going to be a fairly brittle solution, and you should consider using an actual parser.

That said, to solve your immediate problem, you want to use non-greedy matching - the *? quantifier instead of just the *.

Like so:

\bTest\b\s*\(\s*\".*?\"\s*,\s*\".*?\"\s*\)


A poor mans C function parser, in Perl.

## ===============================================
## C_FunctionParser_v3.pl  @  3/21/09
## -------------------------------
## C/C++ Style Function Parser
##  Idea - To parse out C/C++ style functions
##  that have parenthetical closures (some don't).
##  - sln  
## ===============================================
my $VERSION = 3.0;
$|=1;

use strict;
use warnings;

# Prototype's
  sub Find_Function(\$\@);

# File-scoped variables
  my ($FxParse, $FName, $Preamble);

# Set function name, () gets all functions

  SetFunctionName('Test');    # Test case, function 'Test'

## --------

# Source file
   my $Source = join '', <DATA>;

# Extended, possibly non-compliant,
# function name - pattern examples:
# (no capture groups in function names strings or regex!)
#  - - -
#  SetFunctionName( qr/_T/ );
#  SetFunctionName( qr/\(\s*void\s*\)\s*function/ );
#  SetFunctionName( "\\(\\s*void\\s*\\)\\s*function" );

# Parse some functions
    my @Funct = ();
    Find_Function( $Source, @Funct );

# Print functions found
# (segments can be modified and/or collated)

    if ( !@Funct ) {
        print "Function name pattern: '$FName' not found!\n";
    } else {
        print "\nFound ".@Funct." matches.\nFunction pattern: '$FName' \n";
    }
    for my $ref (@Funct) {
        # Format;  @: Line number - function
        printf "\n\@: %6d - %s\n", $$ref[3], substr($Source, $$ref[0], $$ref[2] - $$ref[0]);
   }

exit;

## End 

# ---------
# Set the parser's function regex pattern
#
sub SetFunctionName
{
    if (!@_) {
        $FName = "_*[a-zA-Z][\\w]*"; # Matches all compliant function names (default)
    } else {
        $FName = shift;  # No capture groups in function names please
    }
    $Preamble   = "\\s*\\(";

    # Compile function parser regular expression
      # Regex condensed:
      # $FxParse = qr!//(?:[^\\]|\\\n?)*?\n|/\*.*?\*/|\\.|'["()]'|(")|($FName$Preamble)|(\()|(\))!s;
      #                                    |         |   |       |1 1|2               2|3  3|4  4
      # Note - Non-Captured, matching items, are meant to consume!
      # -----------------------------------------------------------
      # Regex /xpanded (with commentary):
      $FxParse =                      # Regex Precedence (items MUST be in this order):
        qr!                           # -----------------------------------------------
             //                       # comment - //
                (?:                   #    grouping
                    [^\\]             #       any non-continuation character ^\
                  |                   #         or
                    \\\n?             #       any continuation character followed by 0-1 newline \n
                )*?                   #    to be done 0-many times, stopping at the first end of comment
             \n                       #  end of comment - //
          |  /\*.*?\*/                # or, comment - /*  + anything + */
          |  \\.                      # or, escaped char - backslash + ANY character
          |  '["()]'                  # or, single quote char - quote then one of ", (, or ), then quote
          |  (")                      # or, capture $1 - double quote as a flag
          |  ($FName$Preamble)        # or, capture $2 - $FName + $Preamble
          |  (\()                     # or, capture $3 - ( as a flag
          |  (\))                     # or, capture $4 - ) as a flag
      !xs;
}

# Procedure that finds C/C++ style functions
# (the engine)
# Notes:
#   - This is not a syntax checker !!!
#   - Nested functions index and closure are cached. The search is single pass.
#   - Parenthetical closures are determined via cached counter.
#   - This precedence avoids all ambigous paranthetical open/close conditions:
#       1. Dual comment styles.
#       2. Escapes.
#       3. Single quoted characters.
#       4. Double quotes, fip-flopped to determine closure.
#   - Improper closures are reported, with the last one reliably being the likely culprit
#     (this would be a syntax error, ie: the code won't complie, but it is reported as a closure error).
#
sub Find_Function(\$\@)
{
    my ($src, $Funct) = @_;
    my @Ndx     = ();
    my @Closure = ();
    my ($Lines, $offset, $closure, $dquotes) = (1,0,0,0);

    while ($$src =~ /$FxParse/xg)
    {
        if (defined $1)  # double quote "
        {
            $dquotes = !$dquotes;
        }
        next if ($dquotes);

        if (defined $2)  # 'function name'
        {
            # ------------------------------------
            # Placeholder for exclusions......
            # ------------------------------------

            # Cache the current function index and current closure
              push  @Ndx, scalar(@$Funct);
              push  @Closure, $closure;

              my ($funcpos, $parampos) = ( $-[0], pos($$src) );

            # Get newlines since last function
              $Lines += substr ($$src, $offset, $funcpos - $offset) =~ tr/\n//;
              # print $Lines,"\n";

            # Save positions:   function(   parms     )
              push  @$Funct  ,  [$funcpos, $parampos, 0, $Lines];

            # Asign new offset
              $offset = $funcpos;
            # Closure is now 1 because of preamble '('
              $closure = 1;
        }
        elsif (defined $3)  # '('
        {
            ++$closure;
        }
        elsif (defined $4)  # ')'
        {
            --$closure;
            if ($closure <= 0)
            {
                $closure = 0;
                if (@Ndx)
                {
                    # Pop index and closure, store position
                      $$Funct[pop @Ndx][2] = pos($$src);
                      $closure = pop @Closure;
                }
            }
        }
    }

    # To test an error, either take off the closure of a function in its source,
    # or force it this way (pseudo error, make sure you have data in @$Funct):
    # push @Ndx, 1;

    # Its an error if index stack has elements.
    # The last one reported is the likely culprit.
    if (@Ndx)
    {
        ## BAD, RETURN ...
        ## All elements in stack have to be fixed up
        while ( @Ndx ) {
            my $func_index = shift @Ndx;
            my $ref = $$Funct[$func_index];
            $$ref[2] = $$ref[1];
            print STDERR "** Bad return, index = $func_index\n";
            print "** Error! Unclosed function [$func_index], line ".
                 $$ref[3].": '".substr ($$src, $$ref[0], $$ref[2] - $$ref[0] )."'\n";
        }
        return 0;
    }
    return 1
}

__DATA__
x == Test("123" ,  "ABC") || x == Test ("123" , "DEF")
Test("123" , Test ("123" , "GHI"))? 
Test("123" , "ABC(JKL)") || x == Test ("123" , "MNO")

Output (line # - function):

Found 6 matches.
Function pattern: 'Test'

@:      1 - Test("123" ,  "ABC")

@:      1 - Test ("123" , "DEF")

@:      2 - Test("123" , Test ("123" , "GHI"))

@:      2 - Test ("123" , "GHI")

@:      3 - Test("123" , "ABC(JKL)")

@:      3 - Test ("123" , "MNO")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜