preg_match_all matching the whole string if no empty lines, and matching individually if there are
Hey, So in php I want preg_match_all to return:
Array ( [0] => text1
[1] => text2
)
if the input is:
(text1)
(text2)
and to return:
Array ( [0] =>
this is some text
and (bla bla)
)
in case the input is:
(this is some text
and (bla bla)
)
aka if there is a new line between () and () they should be matched individually and if there are no new lines the whole string should be treated as one string
so 开发者_如何学JAVAin the first example this works
preg_match_all('/\((.*)\)/', $match, $matches);
and in the second example this works:
preg_match_all('/\((.*)\)/s', $match, $matches);
by adding the s modifier but I'm not able to write a regular expression that match the two cases in the way I want
any help is highly appreciated
It sounds like you want to allow newlines inside the parentheses, but not two or more newlines in a row. A typical regex for that might be
'~\(((?:.|\n(?!\r?\n))*)\)~'
But that doesn't allow for different kinds of newline (like \r\n
, or just \r
), nor does it match if there's horizontal whitespace (like spaces or tabs) on the "empty" line. It also doesn't make sure the parentheses are properly balanced; I don't know how important that is to you. Try this regex:
'~\(((?:[^\r\n()]|(?:\r\n|[\r\n])(?![ \t]*(?:\r\n|[\r\n]))|(?R))*+)\)~'
The first alternative ,[^\r\n()]
, matches anything that's not a newline (or part of one) or a parenthesis.
If that fails, (?:\r\n|[\r\n])
tries to match one of the three kinds of newline, and the negative lookahead (?![ \t]*(?:\r\n|[\r\n]))
makes sure the newline isn't followed by another newline, either immediately or with spaces or tabs between them.
If the third alternative is reached, the next character should be either an open-paren, in which case (?R)
tries to apply the whole regex recursively; or a close-paren, in which case the final \)
finishes off the match (or pops up to the next higher level of recursion).
Of course, this doesn't account for the possibility of escaped parentheses, Unicode whitespace and line separators, or any number of other refinements, but I'm really just demonstrating how to enforce the no more than one newline rule, and to explain why it's more difficult to do than you might have expected.
Try using preg_split
to split the input string on the whitespace which if has )
before it and a (
after it.
$arr = preg_split('/(?<=\))\s+(?=\()/',$input);
See it
y default the * and + are greedy and try to use the longest possible match using the U modifier you can pcre to be ungreedy.
preg_match_all('/\((.*)\)/Us', $match, $matches);
should work. You can also make a specific modifier ungreedy like this:
preg_match_all('/\((.*?)\)/s', $match, $matches);
See http://php.net/manual/en/regexp.reference.repetition.php and
精彩评论