开发者

Url parsing and simplification in PHP

I'm parsing the links found on webpages, and I'm looking for a way to convert URLs like this:开发者_如何转开发

http://www.site.com/./eng/.././disclaimer/index.htm

to the equivalent and more correct

http://www.site.com/disclaimer/index.htm

mainly for avoiding duplicates.

Thank you.


like this

function simplify($path) {
   $r = array();
   foreach(explode('/', $path) as $p) {
      if($p == '..')
        array_pop($r);
      else if($p != '.' && strlen($p))
        $r[] = $p;
   }
   $r = implode('/', $r);
   if($path[0] == '/') $r = "/$r";
   return $r;
}

and this is how you use it

$u = parse_url($dirtyUrl);
$u['path'] = simplify($u['path']);
$clean_url = "{$u['scheme']}://{$u['host']}{$u['path']}";


Exactly what makes you think those two URL:s are equivalent?

If you can answer this question in a detailed fashion, use a regexp or parser to adhere to the rules which you know indicates that the pages are equivalent.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜