Regular Expressions: get what is outside of the brackets
I'm using PHP and I have text like:
first [abc] middle [xyz] last
I need to get what's inside and outside o开发者_JS百科f the brackets. Searching in StackOverflow I found a pattern to get what's inside:
preg_match_all('/\[.*?\]/', $m, $s)
Now I'd like to know the pattern to get what's outside.
Regards!
You can use preg_split
for this as:
$input ='first [abc] middle [xyz] last';
$arr = preg_split('/\[.*?\]/',$input);
print_r($arr);
Output:
Array
(
[0] => first
[1] => middle
[2] => last
)
This allows some surrounding spaces in the output. If you don't want them you can use:
$arr = preg_split('/\s*\[.*?\]\s*/',$input);
preg_split
splits the string based on a pattern. The pattern here is [
followed by anything followed by ]
. The regex to match anything is .*
. Also [
and ]
are regex meta char used for char class. Since we want to match them literally we need to escape them to get \[.*\]
. .*
is by default greedy and will try to match as much as possible. In this case it will match abc] middle [xyz
. To avoid this we make it non greedy by appending it with a ?
to give \[.*?\]
. Since our def of anything here actually means anything other than ]
we can also use \[[^]]*?\]
EDIT:
If you want to extract words that are both inside and outside the []
, you can use:
$arr = preg_split('/\[|\]/',$input);
which split the string on a [
or a ]
$inside = '\[.+?\]';
$outside = '[^\[\]]+';
$or = '|';
preg_match_all(
"~ $inside $or $outside~x",
"first [abc] middle [xyz] last",
$m);
print_r($m);
or less verbose
preg_match_all("~\[.+?\]|[^\[\]]+~", $str, $matches)
Use preg_split instead of preg_match.
preg_split('/\[.*?\]/', 'first [abc] middle [xyz] last');
Result:
array(3) {
[0]=>
string(6) "first "
[1]=>
string(8) " middle "
[2]=>
string(5) " last"
}
ideone
As every one says that you should use preg_split, but only one person replied with an expression that meets your needs, and i think that is a little complex - not complex, a little to verbose but he has updated his answer to counter that.
This expression is what most of the replies have stated.
/\[.*?\]/
But that only prints out
Array
(
[0] => first
[1] => middle
[2] => last
)
and you stated you wanted whats inside and outside the braces, sio an update would be:
/[\[.*?\]]/
This gives you:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
but as you can see that its capturing white spaces as well, so lets go a step further and get rid of those:
/[\s]*[\[.*?\]][\s]*/
This will give you a desired result:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
This i think is the expression your looking for.
Here is a LIVE Demonstration of the above Regex
精彩评论