PHP: comparing URIs which differ in percent-encoding
In PHP, I want to compare two relative URLs for equality. The catch: URLs may differ in percent-encoding, e.g.
/dir/file+file
vs./dir/file%20file
/dir/file(file)
vs./dir/file%28file%29
/dir/file%5bfile
vs./dir/file%5Bfile
According to R开发者_如何学JAVAFC 3986, servers should treat these URIs identically. But if I use ==
to compare, I'll end up with a mismatch.
So I'm looking for a PHP function which will accepts two strings and returns TRUE
if they represent the same URI (dicounting encoded/decoded variants of the same char, upper-case/lower-case hex digits in encoded chars, and +
vs. %20
for spaces), and FALSE
if they're different.
I know in advance that only ASCII chars are in these strings-- no unicode.
function uriMatches($uri1, $uri2)
{
return urldecode($uri1) == urldecode($uri2);
}
echo uriMatches('/dir/file+file', '/dir/file%20file'); // TRUE
echo uriMatches('/dir/file(file)', '/dir/file%28file%29'); // TRUE
echo uriMatches('/dir/file%5bfile', '/dir/file%5Bfile'); // TRUE
urldecode
EDIT: Please look at @webbiedave's response. His is much better (I wasn't even aware that there was a function in PHP to do that.. learn something new everyday)
You will have to parse the strings to look for something matching %##
to find the occurences of those percent encoding. Then taking the number from those, you should be able to pass it so the chr() function to get the character of those percent encodings. Rebuild the strings and then you should be able to match them.
Not sure that's the most efficient method, but considering URLs are not usually that long, it shouldn't be too much of a performance hit.
I know this problem here seems to be solved by webbiedave, but I had my own problems with it.
First problem: Encoded characters are case-insensitive. So %C3 and %c3 are both the exact same character, although they are different as a URI. So both URIs point to the same location.
Second problem: folder%20(2) and folder%20%282%29 are both validly urlencoded URIs, which point to the same location, although they are different URIs.
Third problem: If I get rid of the url encoded characters I have two locations having the same URI like bla%2Fblubb and bla/blubb.
So what to do then? In order to compare two URIs, I need to normalize both of them in a way that I split them in all components, urldecode all paths and query-parts for once, rawurlencode them and glue them back together and then I could compare them.
And this could be the function to normalize it:
function normalizeURI($uri) {
$components = parse_url($uri);
$normalized = "";
if ($components['scheme']) {
$normalized .= $components['scheme'] . ":";
}
if ($components['host']) {
$normalized .= "//";
if ($components['user']) { //this should never happen in URIs, but still probably it's anything can happen thursday
$normalized .= rawurlencode(urldecode($components['user']));
if ($components['pass']) {
$normalized .= ":".rawurlencode(urldecode($components['pass']));
}
$normalized .= "@";
}
$normalized .= $components['host'];
if ($components['port']) {
$normalized .= ":".$components['port'];
}
}
if ($components['path']) {
if ($normalized) {
$normalized .= "/";
}
$path = explode("/", $components['path']);
$path = array_map("urldecode", $path);
$path = array_map("rawurlencode", $path);
$normalized .= implode("/", $path);
}
if ($components['query']) {
$query = explode("&", $components['query']);
foreach ($query as $i => $c) {
$c = explode("=", $c);
$c = array_map("urldecode", $c);
$c = array_map("rawurlencode", $c);
$c = implode("=", $c);
$query[$i] = $c;
}
$normalized .= "?".implode("&", $query);
}
return $normalized;
}
Now you can alter webbiedave's function to this:
function uriMatches($uri1, $uri2) {
return normalizeURI($uri1) === normalizeURI($uri2);
}
That should do. And yes, it is quite more complicated than even I wanted it to be.
精彩评论