PHP Regexp (PCRE) - Find a set of all substrings 2
For example have a source string:
__aaXXccYYeeXX_ZZkkYYmmXX_ZZnnXX开发者_如何转开发ooYYuuXX_ZZvv..
How can I find all: aaXX*YY*ZZ
__ aaXX cc YY eeXX_ ZZ kkYYmmXX_ZZnnXXooYYuuXX_ZZvv..
__ aaXX cc YY eeXX_ZZkkYYmmXX_ ZZ nnXXooYYuuXX_ZZvv..
__ aaXX cc YY eeXX_ZZkkYYmmXX_ZZnnXXooYYuuXX_ ZZ vv..
__ aaXX ccYYeeXX_ZZkk YY mmXX_ ZZ nnXXooYYuuXX_ZZvv..
__ aaXX ccYYeeXX_ZZkk YY mmXX_ZZnnXXooYYuuXX_ ZZ vv..
__ aaXX ccYYeeXX_ZZkkYYmmXX_ZZnnXXoo YY uuXX_ ZZ vv..
The problem is that a PHP preg doesn't support ?+* (variable length) in (?<=exp) lookbehind assertion (allow only with fixed length {N}).
So need solution without using lookbehind assertion with variable length.
Thank you!
This script works:
<?php // test.php 20110311_1200
$data = '__aaXXccYYeeXX_ZZkkYYmmXX_ZZnnXXooYYuuXX_ZZvv..';
$all_matches = array();
$yy_match = true; // Get past first for test condition.
for ($yy_cnt = 1; $yy_match; ++$yy_cnt) {
$yy_match = false; // Assume failure for this yy_cnt.
$zz_match = true; // Get past first for test condition.
for ($zz_cnt = 1; $zz_match; ++$zz_cnt) {
$zz_match = false; // Assume failure for this zz_cnt.
// Assemble new regex with new $yy_cnt and $zz_cnt.
$re = "/ # Match all combinations of XX..YY..ZZ.
(aaXX) # $1: Prefix X.
(?: # Group to find YY[yy_cnt].
(?:(?!YY).)* # Zero or more non-YY.
(YY) # $2: next YY.
){{$yy_cnt}} # yy_cnt.
(?: # Group to find ZZ[zz_cnt].
(?:(?!ZZ).)* # Zero or more non-ZZ.
(ZZ) # $3 next ZZ.
){{$zz_cnt}} # $zz_cnt.
/x";
if (preg_match($re, $data, $matches, PREG_OFFSET_CAPTURE)) {
$zz_match = true;
$yy_match = true;
$all_matches[] = $matches;
printf("Match found. \$yy_cnt = %d, \$zz_cnt = %d\n",
$yy_cnt, $zz_cnt);
}
}
}
print_r($all_matches);
?>
You need to loop. First look for __aaXX
followed by the next YY
, then __aaXX
followed by the second YY
etc. In regex land that means you first look for __aaXX(.*?YY){1}
, then __aaXX(.*?YY){2}
(can you see a loop variable in there?) and so on until the pattern fails. Same for the second part when you are looking for the ZZ
s.
How about this pattern: # aaXX(.*) YY (.*) ZZ .*#
?
From your highlighting it's not entirely clear what your result should look like... I added spaces because you have them in the highlighting, but it's not clear if you'll have them in your source...
Edit
I guess I'm not understanding what you want to get, but another thing to look at is preg_match_all, if your YY ZZ part repeats... Something like #_aaXX((.*?)YY(.*?)ZZ)+#
.
精彩评论