Url parsing and simplification in PHP
I'm parsing the links found on webpages, and I'm looking for a way to convert URLs like this:开发者_如何转开发
http://www.site.com/./eng/.././disclaimer/index.htm
to the equivalent and more correct
http://www.site.com/disclaimer/index.htm
mainly for avoiding duplicates.
Thank you.
like this
function simplify($path) {
$r = array();
foreach(explode('/', $path) as $p) {
if($p == '..')
array_pop($r);
else if($p != '.' && strlen($p))
$r[] = $p;
}
$r = implode('/', $r);
if($path[0] == '/') $r = "/$r";
return $r;
}
and this is how you use it
$u = parse_url($dirtyUrl);
$u['path'] = simplify($u['path']);
$clean_url = "{$u['scheme']}://{$u['host']}{$u['path']}";
Exactly what makes you think those two URL:s are equivalent?
If you can answer this question in a detailed fashion, use a regexp or parser to adhere to the rules which you know indicates that the pages are equivalent.
精彩评论