开发者

Matching x words in brackets regex

I am trying to remove brackets from a string if it contains 4 or more words. I have been scratching my head and cannot get anywhere wi开发者_StackOverflowth it.

preg_replace('#\([word]{4,}\)#', '', $str); # pseudo code

Sample string:

Robert Alner Fund Standard Open NH Flat Race (Supported by The Andrew Stewart Charitable Foundation)

To match (more than x words in brackets) and remove:

(Supported by The Andrew Stewart Charitable Foundation)

I have two sources of data, and am using:

similar_text($str1, $str2, &$percent)

to compare and longish strings in brackets are unique to one source.


Well, you're close...

preg_replace('#\((\b\w+\b[^\w)]*){4,}\)#', '', $str);

Basically, the inner sub-pattern (\b\w+\b[^\w)]*) matches a word-boundary (meaning not in-between two word characters) followed by at least one word character (a-z0-9), followed by another word-boundary, and finally followed by 0 or more characters that are not word characters and are not )...

Testing with:

$tests = array(
    'test1 (this is three)',
    'test2 (this is four words)',
    'test3 (this is four words) and (this is three)',
    'test4 (this is five words inside)',
);

foreach ($tests as $str) {
    echo $str . " - " . preg_replace('#\((\b\w+\b[^\w)]*){4,}\)#', '', $str) . "\n";
}

Gives:

test1 (this is three) - test1 (this is three)
test2 (this is four words) - test2
test3 (this is four words) and (this is three) - test3  and (this is three)
test4 (this is five words inside) - test4


You don't need preg_replace() for this. Just count the spaces with substr_count(), then use str_replace().


The syntax […] has a special meaning. […] are so called character classes and match one of the listed characters. So [word] matches one of the character of w, o, r, d.

Now if you want to match words, you should first define what a word is. If a word is a sequence of characters except whitespace characters (\S represents all non-whitespace characters), you could do this:

/\S+(\s+\S+){3,}/

This matches any sequence of four or more words (sequence of non-whitespace characters) that are separated by whitespace characters (\s).

And four or more words in brackets:

/\(\S+(\s+\S+){3,})/

Note that \S does match anything else but whitespace characters, that means even the surrounding brackets. So you might want to change \S to [^\s)]:

/\([^\s)]+(\s+[^\s)]+){3,})/


I'm no expert, but this might work. Here's a pattern string:

/\(((\w*?\s){3,}[\w]+?.*?)\)/i

And here's a replacement string in PHP to take everything except the leading and trailing escaped parentheses.

$1

Here's the preg_replace function.

preg_replace('/\(((\w*?\s){3,}[\w]+?.*?)\)/i',$1,$string);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜