PHP Preg-Replace more than one underscore
How do I, using preg_replace
开发者_如何学JAVA, replace more than one underscore with just one underscore?
The +
operator (quantifier) matches multiple instances of the last character (, character class or capture group or back-reference).
$string = preg_replace('/_+/', '_', $string);
This would replace one or more underscores with a single underscore.
Technically more correct to the title of the question then is to only replace two or more:
$string = preg_replace('/__+/', '_', $string);
Or writing the quantifier with braces:
$string = preg_replace('/_{2,}/', '_', $string);
And perhaps then to capture and (back-) reference:
$string = preg_replace('/(_)\1+/', '\1', $string);
preg_replace('/[_]+/', '_', $your_string);
Actually using /__+/
or /_{2,}/
would be better than /_+/
since a single underscore does not need to be replaced. This will improve the speed of the preg variant.
Running tests, I found this:
while (strpos($str, '__') !== false) {
$str = str_replace('__', '_', $str);
}
to be consistently faster than this:
$str = preg_replace('/[_]+/', '_', $str);
I generated the test strings of varying lengths with this:
$chars = array_merge(array_fill(0, 50, '_'), range('a', 'z'));
$str = '';
for ($i = 0; $i < $len; $i++) { // $len varied from 10 to 1000000
$str .= $chars[array_rand($chars)];
}
file_put_contents('test_str.txt', $str);
and tested with these scripts (run separately, but on identical strings for each value of $len):
$str = file_get_contents('test_str.txt');
$start = microtime(true);
$str = preg_replace('/[_]+/', '_', $str);
echo microtime(true) - $start;
and:
$str = file_get_contents('test_str.txt');
$start = microtime(true);
while (strpos($str, '__') !== false) {
$str = str_replace('__', '_', $str);
}
echo microtime(true) - $start;
For shorter strings the str_replace()
method was as much as 25% faster than the preg_replace()
method. The longer the string, the less the difference, but str_replace()
was always faster.
I know some would prefer one method over the other for reasons other than speed, and I'd be glad to read comments regarding the results, testing method, etc.
For anyone attracted to @GZipp's answer for benchmark/microptimization reasons, I think the following post-test loop should execute slightly better than the pre-test while()
loop because the strpos()
call has been removed.
str_replace()
has a reference variable parameter that can be used to break the loop without an extra, iterated function call. Granted it will always attempt to do at least one replacement, and it won't stop until after it has traversed the string with no replacements.
Code: (Demo)
$str = 'one_two__three___four____bye';
do {
$str = str_replace('__', '_', $str, $count);
} while ($count);
var_export($str);
// 'one_two_three_four_bye'
As for preg_replace()
, here are a couple of good options:
echo preg_replace('/_{2,}/', '_', $str);
echo preg_replace('/_\K_+/', '', $str); // \K forgets the first, remembers the rest
I don't recommend using +
because it makes needless replacements (_
to _
)
echo preg_replace('/_+/', '_', $str);
There is definitely no benefit to using a character class or /[_]+/
./[_]{2,}/
The benefit of using preg_replace()
is that the string is never traversed more than once. This makes it a very direct and appropriate tool.
preg_replace()
the + operator is needed
$text = "______";
$text = preg_replace('/[_]+/','_',$text);
You can also use T-Regx library which has automatic delimiters.
pattern('_+')->replace($your_string)->with('_');
精彩评论