Matching x words in brackets regex
I am trying to remove brackets from a string if it contains 4 or more words. I have been scratching my head and cannot get anywhere wi开发者_StackOverflowth it.
preg_replace('#\([word]{4,}\)#', '', $str); # pseudo code
Sample string:
Robert Alner Fund Standard Open NH Flat Race (Supported by The Andrew Stewart Charitable Foundation)
To match (more than x words in brackets) and remove:
(Supported by The Andrew Stewart Charitable Foundation)
I have two sources of data, and am using:
similar_text($str1, $str2, &$percent)
to compare and longish strings in brackets are unique to one source.
Well, you're close...
preg_replace('#\((\b\w+\b[^\w)]*){4,}\)#', '', $str);
Basically, the inner sub-pattern (\b\w+\b[^\w)]*)
matches a word-boundary (meaning not in-between two word characters) followed by at least one word character (a-z0-9), followed by another word-boundary, and finally followed by 0 or more characters that are not word characters and are not )
...
Testing with:
$tests = array(
'test1 (this is three)',
'test2 (this is four words)',
'test3 (this is four words) and (this is three)',
'test4 (this is five words inside)',
);
foreach ($tests as $str) {
echo $str . " - " . preg_replace('#\((\b\w+\b[^\w)]*){4,}\)#', '', $str) . "\n";
}
Gives:
test1 (this is three) - test1 (this is three)
test2 (this is four words) - test2
test3 (this is four words) and (this is three) - test3 and (this is three)
test4 (this is five words inside) - test4
You don't need preg_replace()
for this. Just count the spaces with substr_count()
, then use str_replace()
.
The syntax […]
has a special meaning. […]
are so called character classes and match one of the listed characters. So [word]
matches one of the character of w
, o
, r
, d
.
Now if you want to match words, you should first define what a word is. If a word is a sequence of characters except whitespace characters (\S
represents all non-whitespace characters), you could do this:
/\S+(\s+\S+){3,}/
This matches any sequence of four or more words (sequence of non-whitespace characters) that are separated by whitespace characters (\s
).
And four or more words in brackets:
/\(\S+(\s+\S+){3,})/
Note that \S
does match anything else but whitespace characters, that means even the surrounding brackets. So you might want to change \S
to [^\s)]
:
/\([^\s)]+(\s+[^\s)]+){3,})/
I'm no expert, but this might work. Here's a pattern string:
/\(((\w*?\s){3,}[\w]+?.*?)\)/i
And here's a replacement string in PHP to take everything except the leading and trailing escaped parentheses.
$1
Here's the preg_replace function.
preg_replace('/\(((\w*?\s){3,}[\w]+?.*?)\)/i',$1,$string);
精彩评论