PHP - help fix a bug caused by two number dividing exactly
I have been working with a piece of code for youtube style URL's but I have found a bug and I am hoping someone can show me the most efficient way to fix it.
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
static $passcache;
if(empty($passcache))
$passcache = array();
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
if (!empty($passKey)) {
// Although this function's purpose is to just make the
// ID short - and not so much secure,
// with this patch by Simon Franz (http://blog.snaky.org/)
// you can optionally supply a password to make it harder
// to calculate the corresponding numeric ID
if(isset($passcache[$passKey]))
$index = $passcache[$passKey];
else {
if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
$passhash = hash('sha512',$passKey);
$p = str_split($passhash);
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
$passcache = $index;
}
}
$base = strlen($index);
if ($to_num) {
// Digital number <<--开发者_C百科 alphabet letter code
$in = strrev($in);
$out = 0;
$len = strlen($in) - 1;
for ($t = 0; $t <= $len; $t++) {
$bcpow = bcpow($base, $len - $t);
$out += strpos($index, $in[$t]) * $bcpow;
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
$out = "";
for ($t = floor(log10($in) / log10($base)); $t >= 0; $t--) {
$bcp = bcpow($base, $t);
$a = floor($in / $bcp);
$out .= $index[$a];
$in -= $a * $bcp;
}
$out = strrev($out); // reverse
}
return $out;
}
The bug is only when encoding a single number 238328 as it is my base to the power of three. As a result it divides exactly and because of use of 'floor' it goes unnoticed and the script tries to add the 62nd character which does not exist and only produces a three character code rather than four... thus 'aa' is the result rather than 'aaab'.
Here is the problem part of the code:
for ($t = floor(log10($in) / log10($base)); $t >= 0; $t--) {
$bcp = bcpow($base, $t);
$a = floor($in / $bcp);
$out .= $index[$a];
$in -= $a * $bcp;
And to make it even easier here is the call to get the error
echo alphaID(238328);
credits: Orginally written by Kevin Vanzonneveld: kevin dot vanzonneveld dot net, modified by Simon Franz: blog dot snaky dot org and optimised by Stackoverflows very own mattbasta
Here you go:
function preciseDivision($x,$y)
{
// Correct floor's failures by adding a bit of overhead
$epsilon = 0.00000001;
return floor(($x/$y) + $epsilon);
}
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
static $passcache;
if(empty($passcache))
$passcache = array();
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
if (!empty($passKey)) {
// Although this function's purpose is to just make the
// ID short - and not so much secure,
// with this patch by Simon Franz (http://blog.snaky.org/)
// you can optionally supply a password to make it harder
// to calculate the corresponding numeric ID
if(isset($passcache[$passKey]))
$index = $passcache[$passKey];
else {
if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
$passhash = hash('sha512',$passKey);
$p = str_split($passhash);
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
$passcache = $index;
}
}
$base = strlen($index);
if ($to_num) {
// Digital number <<-- alphabet letter code
$in = strrev($in);
$out = 0;
$len = strlen($in) - 1;
for ($t = 0; $t <= $len; $t++) {
$bcpow = bcpow($base, $len - $t);
$out += strpos($index, $in[$t]) * $bcpow;
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
$out = "";
for ($t = preciseDivision(log10($in),log10($base)); $t >= 0; $t--) {
$bcp = bcpow($base, $t);
$a = preciseDivision($in, $bcp);
$out .= $index[$a];
$in -= $a * $bcp;
}
$out = strrev($out); // reverse
}
return $out;
}
The problem here was not floor, but floating point precision. The division resulted 2.99999999, and the floor(2.999999) is equal to 2, not 3. This happens because limited size of floating point variables.
That's why it did not work.
I wrote a function preciseDivision, which automatically adds a very small value to the division, to get through this.
And I still believe that there should exist cleaner solutions to this url hashing problem. I'll see what I can do.
As per my answer to your other question, try replacing log10($in) / log10($base)
with log($in, $base)
.
This avoids the inaccuracies associated with dividing the results of the two logarithms as floating point numbers and gives you the correct result.
Adding another answer as the first one is also working, though this one is cleaner.
I got rid of BC Math functions. If you're going to work with really large integers, this may not work. Otherwise, this is much cleaner solution:
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
static $passcache;
if(empty($passcache))
$passcache = array();
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
if (!empty($passKey)) {
// Although this function's purpose is to just make the
// ID short - and not so much secure,
// with this patch by Simon Franz (http://blog.snaky.org/)
// you can optionally supply a password to make it harder
// to calculate the corresponding numeric ID
if(isset($passcache[$passKey]))
$index = $passcache[$passKey];
else {
if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
$passhash = hash('sha512',$passKey);
$p = str_split($passhash);
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
$passcache = $index;
}
}
$base = strlen($index);
if ($to_num) {
// Digital number <<-- alphabet letter code
// A conversion from base $base to base 10
$out = 0; // End number
$shift = 1; // Starting shift
$len = strlen($in); // Length of string
for ($t = 0; $t < $len; $t++)
{
$out += strpos($index, $in[$t]) * $shift; // $out is a number form alphabet * base^shift
$shift *= $base; // increase shift
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
$out = "";
// A simple conversion from base 10 to base $base
while ($in > 0)
{
$remainder = $in % $base;
$in = intval(($in-$remainder)/$base);
$out .= $index[$remainder];
}
}
return $out;
}
The code is cleaner, and should be faster as well. Now it is much easier to see that this is only conversion from base 10 to base $base (62?) and vica-versa. It does not involve floating point division, thus it does not have the bug mentioned above.
If you need multiplying large integers, and so on, this can be implemented this way as well with some bright thinking.
Added BC Maths, as you said you need large integers
function alphaID($in, $to_num = false, $pad_up = false, $passKey = null)
{
static $passcache;
if(empty($passcache))
$passcache = array();
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$i = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
if (!empty($passKey)) {
// Although this function's purpose is to just make the
// ID short - and not so much secure,
// with this patch by Simon Franz (http://blog.snaky.org/)
// you can optionally supply a password to make it harder
// to calculate the corresponding numeric ID
if(isset($passcache[$passKey]))
$index = $passcache[$passKey];
else {
if(strlen($passhash = hash('sha256',$passKey)) < strlen($index))
$passhash = hash('sha512',$passKey);
$p = str_split($passhash);
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
$passcache = $index;
}
}
$base = strlen($index);
if ($to_num) {
// Digital number <<-- alphabet letter code
// A conversion from base $base to base 10
$out = '0'; // End number
$shift = 1; // Starting shift
$len = strlen($in); // Length of string
for ($t = 0; $t < $len; $t++)
{
$out = bcadd($out, bcmul(strpos($index, $in[$t]),$shift)); // $out is a number from alphabet * base^shift
$shift = bcmul($shift, $base); // increase shift
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
$out = "";
// A simple conversion from base 10 to base $base
while ($in > '0') // We're treating integer as a string, so BC math works
{
$remainder = bcmod($in,$base);
$in = bcdiv($in, $base);
$out .= $index[$remainder];
}
}
return $out;
}
精彩评论