开发者

urlencode: how to remove certain characters like commas?

Imagine an raw url which i want to convert to lowercase, have all spaced replaced w开发者_开发技巧ith dashes - and all comma's replaces with nothing. Currently I have this:

$pageurle = str_replace(' ', '-', $pagename);
$pageurle = strtolower($pageurle);
$pageurle = urlencode($pageurle);

which works but does not remove comma's. When I add this:

$pageurle = str_replace(',', '', $pagename);

then I get comma's removed, but all dashes become + ??? how do I solve this?

In general I would be happy to have a list of chars like - @ & or -- or other stuff which i would be happy to remove manually from my nice urls.


The problem here is that you are referencing $pagename twice. You should be referencing $pageurle if you wish to do further replacements. Otherwise your first replacement is overwritten. The - aren't getting replaced with +, rather spaces from the original $pagename are.

Note that str_replace() can also take in arrays. So, you should be able to place a list of entities you want to replace in an array, and a list of their replaces in another, and call str_replace() and have it do it all in one go. See http://php.net/manual/en/function.str-replace.php

$search=array(' ', '--');
$replace=array('-', 'somethingelse');

$pageurle=urlencode(str_replace($search, $replace, $pagename));


function slugify($text)
{
  // we don't want "amp" and similar in our urls
  $text = htmlspecialchars_decode($text, ENT_QUOTES);

  // replace non letter or digits by -
  $text = preg_replace('~[^\\pL\d]+~u', '-', $text);

  // trim
  $text = trim($text, '-');

  // transliterate
  $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);

  // lowercase
  $text = strtolower($text);

  // remove unwanted characters
  $text = preg_replace('~[^-\w]+~', '', $text);

  if (empty($text))
  {
    return 'n-a';
  }

  return $text;
}

Sometimes iconv doesn't works as expected. If it's the case, to set a locale should fix:

setlocale(LC_ALL, 'en_US.utf8'); 


You really shouldn't be trying to make a list of all of the forbidden characters, though -- especially if this is going into a URL:

$pageurle = iconv('UTF-8', 'ASCII//TRANSLIT', $pagename);
$pageurle = preg_replace("/[^a-zA-Z0-9\/_| -]/", '', $pageurle);
$pageurle = strtolower(trim($pageurle, '-'));
$pageurle = preg_replace("/[\/_| -]+/", '-', $pageurle);

The above should thoroughly clean your string and make it URL friendly, all the while keeping foreign characters (IE converting "Ñ" to "N") present.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜