开发者

compress/urlencode a series of 100 base-4 numbers in javascript

First thing: must be done entirely in javascript. (JQuery/mootools optional)

I have a series of 100 numbers each set 0,1,2, or 3 - these represents settings on the page. I would like to encode these to the shortest string possible to create a permalink to the page.

I am thinking the best way would be to store them in binary couplets, convert those couplets to a string, ant then urlencode the string.

However the best I have found so far is parseint( binary_var, 2 ), which coverts a binary number to a base_10 number. However to get the string short enough I'll need a better system.

If I could convert to 64-bit encoding I could store all 开发者_如何转开发the data in just 4 chars, I think. I know urls support unicode now, and I believe I can use escape and unescape to encode/decode 64-bit chars, so the main thing I am looking for is a way to encode/decode binary data to 64-bit characters.

Of course I am not 100% sure this is the best way, or will even work, so it I am completely off track feel free to point me in the right direction.

Thanks!


You can encode such arrays of numbers into a string, 3 per character, like this:

function encodeBase4(base4) {
  var i, rv = [], n = ~~((base4.length + 2) / 3) * 3;

  for (i = 0; i < n; i += 3) {
    rv.push(
      32 +
      ((base4[i] || 0) & 3) +
      ((base4[i + 1] || 0) & 3) * 4 +
      ((base4[i + 2] || 0) & 3) * 16
    );
  }

  return String.fromCharCode.apply(null, rv);
}

You can then convert the other direction like this:

function decodeBase4(str) {
  var i, rv = [], n = str.length;

  for (i = 0; i < n; ++i) {
    var b = str.charCodeAt(i) - 32;
    rv.push(b & 3);
    rv.push(~~(b / 4) & 3);
    rv.push(~~(b / 16) & 3);
  }

  return rv;
}

Here's the jsfiddle which seems to work on its simple test case. (Note that you end up with a list that's a multiple of 3 in length; you'd have to know how many real values there are and just ignore the zeros at the end.)

Now these result strings will be "dirty" and require URL encoding if you're putting them in URLs. If you packed only 2 numbers per character, you could make the resulting strings be all alphabetic, and thus you'd avoid the encoding penalty; however they'd be longer, of course.


100 pieces of information with 2 bits each require 200 bits in total. With base 64 encoding you would require ceil(200/log2(64)) = 34 characters.

A URI path segment allows 79 character that don’t require being encoded using the percent-encoding. If you add the path segment separator / you have 80 characters and thus require ceil(200/log2(80)) = 32 characters. That’s the optimum you can achieve using the path alone.

You could use more than these characters, even Unicode characters. But those would need to be encoded with the percent-encoding as URIs are only allowed to contain US-ASCII. A URI path like (ä = U+00E4) is actually /%C3%A4 and only the browser displays it as .


Here’s an example (functions taken from arbitrary base conversion in javascript):

function getValueOfDigit(digit, alphabet)
{
   var pos = alphabet.indexOf(digit);
   return pos;
}

function convert(src, srcAlphabet, dstAlphabet)
{
   var srcBase = srcAlphabet.length;
   var dstBase = dstAlphabet.length;

   var wet     = src;
   var val     = 0;
   var mlt     = 1;

   while (wet.length > 0)
   {
     var digit  = wet.charAt(wet.length - 1);
     val       += mlt * getValueOfDigit(digit, srcAlphabet);
     wet        = wet.substring(0, wet.length - 1);
     mlt       *= srcBase;
   }

   wet          = val;
   var ret      = "";

   while (wet >= dstBase)
   {
     var digitVal = wet % dstBase;
     var digit    = dstAlphabet.charAt(digitVal);
     ret          = digit + ret;
     wet /= dstBase;
   }

   var digit    = dstAlphabet.charAt(wet);
   ret          = digit + ret;
   
   return ret;
}

var base4Alphabet  = "0123",
    base79Alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~!$&'()*+,;=:@",
    base80Alphabet = base79Alphabet+"/";
alert(convert(getValueOfDigit("010203210", base4Alphabet), base4Alphabet, base80Alphabet));  // "C@Q"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜