urlencode: how to remove certain characters like commas?
Imagine an raw url which i want to convert to lowercase, have all spaced replaced w开发者_开发技巧ith dashes -
and all comma's replaces with nothing. Currently I have this:
$pageurle = str_replace(' ', '-', $pagename);
$pageurle = strtolower($pageurle);
$pageurle = urlencode($pageurle);
which works but does not remove comma's. When I add this:
$pageurle = str_replace(',', '', $pagename);
then I get comma's removed, but all dashes become +
??? how do I solve this?
In general I would be happy to have a list of chars like -
@
&
or --
or other stuff which i would be happy to remove manually from my nice urls.
The problem here is that you are referencing $pagename
twice. You should be referencing $pageurle
if you wish to do further replacements. Otherwise your first replacement is overwritten. The -
aren't getting replaced with +
, rather spaces from the original $pagename
are.
Note that str_replace()
can also take in arrays. So, you should be able to place a list of entities you want to replace in an array, and a list of their replaces in another, and call str_replace()
and have it do it all in one go. See http://php.net/manual/en/function.str-replace.php
$search=array(' ', '--');
$replace=array('-', 'somethingelse');
$pageurle=urlencode(str_replace($search, $replace, $pagename));
function slugify($text)
{
// we don't want "amp" and similar in our urls
$text = htmlspecialchars_decode($text, ENT_QUOTES);
// replace non letter or digits by -
$text = preg_replace('~[^\\pL\d]+~u', '-', $text);
// trim
$text = trim($text, '-');
// transliterate
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
// lowercase
$text = strtolower($text);
// remove unwanted characters
$text = preg_replace('~[^-\w]+~', '', $text);
if (empty($text))
{
return 'n-a';
}
return $text;
}
Sometimes iconv doesn't works as expected. If it's the case, to set a locale should fix:
setlocale(LC_ALL, 'en_US.utf8');
You really shouldn't be trying to make a list of all of the forbidden characters, though -- especially if this is going into a URL:
$pageurle = iconv('UTF-8', 'ASCII//TRANSLIT', $pagename);
$pageurle = preg_replace("/[^a-zA-Z0-9\/_| -]/", '', $pageurle);
$pageurle = strtolower(trim($pageurle, '-'));
$pageurle = preg_replace("/[\/_| -]+/", '-', $pageurle);
The above should thoroughly clean your string and make it URL friendly, all the while keeping foreign characters (IE converting "Ñ" to "N") present.
精彩评论