Optimize "tagging" regex
I currently use this piece of code to reduce a given text to a valid "tagging" format (only lowercase, a-z and minus allowed) by removing/replacing invalid characters
$zip_filename = strtolower($original);
$zip_filename = preg_replace("/[^a-zA-Z\-]/g", '-', $zip_filename); //replace invalid chars
$zip_filename = preg_replace("/-+/g", '-', $zip_filename); // reduce consecutive minus to only one
$zip_filename = preg_replace("/^-/g", '', $zip_filename); // removing leading minus
$zip_filename = preg_replace("/-$/g", '', $zip_filename); // remo开发者_如何学JAVAve trailing minus
Any hints on how to put at least the regex into a single one?
Thanks for any advice!
$zip_filename = trim(preg_replace("/[^a-z]+/", '-', $zip_filename),'-');
Explanation:
A-Z
is useless since it should be lower case- Adding
+
after right bracket will replace one or more consecutive invalid chars - Using
trim
with second parameter - character to trim form beginning and end will speed up the code - Removing
\-
frompreg_replace
will also take car of hyphens between invalid chars / multiple consecutive hyphens, replacing them to single one.
You'll only make the code more difficult to understand/maintain by combining these four operations into one.
You also don't need the complexity and performance hit of regular expression-based operations to achieve what you need.
The reduction of double to single minus signs may be more readily achieved with a looped call to str_replace:
while (substr_count($zip_filename, '--')) {
$zip_filename = str_replace('--', '-', $zip_filename);
}
Wrapping this up in a well-named class method will abstract away any apparent complexity and aid the readability of the code.
The last two operations can be handled by the trim() function:
$zip_filename = trim($zip_filename, '-');
You can then replace your regular expression-based operations with something less cpu aggressive and arguably easier for others to understand:
//replace invalid chars
$zip_filename = preg_replace("/[^a-zA-Z\-]/g", '-', strtolower($original));
// reduce consecutive minus to only one
while (substr_count($zip_filename, '--')) {
$zip_filename = str_replace('--', '-', $zip_filename);
}
// remove leading and trailing minus
$zip_filename = trim($zip_filename, '-');
This should simplify it considerably...
$zip_filename = trim(strtolower($original));
$zip_filename = preg_replace("/\s\s+|--+|[^a-zA-Z-]/g", '-', $zip_filename);
The trim
will take care of the spaces before and after the string. Also note the \s\s+
and --+
. These are more efficient at finding duplicates. They'll only match those characters if there are 2 or more in succession, therefore avoiding unnecessary replacement operations.
But technically it'd still be possible to have leading or trailing dashes. And for that you'd still need this...
$zip_filename = preg_replace("/^-|-$/g", '', $zip_filename);
(This last operation couldn't really be combined with the other since you're using a different replacement string.)
精彩评论