开发者

PHP Regexp (PCRE) - Find a set of all substrings 2

For example have a source string:

__aaXXccYYeeXX_ZZkkYYmmXX_ZZnnXX开发者_如何转开发ooYYuuXX_ZZvv..

How can I find all: aaXX*YY*ZZ

__ aaXX cc YY eeXX_ ZZ kkYYmmXX_ZZnnXXooYYuuXX_ZZvv..

__ aaXX cc YY eeXX_ZZkkYYmmXX_ ZZ nnXXooYYuuXX_ZZvv..

__ aaXX cc YY eeXX_ZZkkYYmmXX_ZZnnXXooYYuuXX_ ZZ vv..

__ aaXX ccYYeeXX_ZZkk YY mmXX_ ZZ nnXXooYYuuXX_ZZvv..

__ aaXX ccYYeeXX_ZZkk YY mmXX_ZZnnXXooYYuuXX_ ZZ vv..

__ aaXX ccYYeeXX_ZZkkYYmmXX_ZZnnXXoo YY uuXX_ ZZ vv..

The problem is that a PHP preg doesn't support ?+* (variable length) in (?<=exp) lookbehind assertion (allow only with fixed length {N}).

So need solution without using lookbehind assertion with variable length.

Thank you!


This script works:

<?php // test.php 20110311_1200
    $data = '__aaXXccYYeeXX_ZZkkYYmmXX_ZZnnXXooYYuuXX_ZZvv..';
    $all_matches = array();
    $yy_match = true; // Get past first for test condition.
    for ($yy_cnt = 1; $yy_match; ++$yy_cnt) {
        $yy_match = false; // Assume failure for this yy_cnt.
        $zz_match = true; // Get past first for test condition.
        for ($zz_cnt = 1; $zz_match; ++$zz_cnt) {
            $zz_match = false; // Assume failure for this zz_cnt.
            // Assemble new regex with new $yy_cnt and $zz_cnt.
            $re = "/ # Match all combinations of XX..YY..ZZ.
                (aaXX)                   # $1: Prefix X.
                (?:                      # Group to find YY[yy_cnt].
                  (?:(?!YY).)*           # Zero or more non-YY.
                  (YY)                   # $2: next YY.
                ){{$yy_cnt}}             # yy_cnt.
                (?:                      # Group to find ZZ[zz_cnt].
                  (?:(?!ZZ).)*           # Zero or more non-ZZ.
                  (ZZ)                   # $3 next ZZ.
                ){{$zz_cnt}}             # $zz_cnt.
                /x";
            if (preg_match($re, $data, $matches, PREG_OFFSET_CAPTURE)) {
                $zz_match = true;
                $yy_match = true;
                $all_matches[] = $matches;
                printf("Match found. \$yy_cnt = %d, \$zz_cnt = %d\n",
                    $yy_cnt, $zz_cnt);
            }
        }
    }
    print_r($all_matches);
?>


You need to loop. First look for __aaXX followed by the next YY, then __aaXX followed by the second YY etc. In regex land that means you first look for __aaXX(.*?YY){1}, then __aaXX(.*?YY){2} (can you see a loop variable in there?) and so on until the pattern fails. Same for the second part when you are looking for the ZZs.


How about this pattern: # aaXX(.*) YY (.*) ZZ .*#?

From your highlighting it's not entirely clear what your result should look like... I added spaces because you have them in the highlighting, but it's not clear if you'll have them in your source...

Edit

I guess I'm not understanding what you want to get, but another thing to look at is preg_match_all, if your YY ZZ part repeats... Something like #_aaXX((.*?)YY(.*?)ZZ)+#.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜