开发者

Large number array compression

I've got a javascript application that sends a large amount of numerical data down the wire. This data is then stored in a database. I am having size issues (too much bandwidth, database getting too big). I am now ready to sacrifice some performance for compression.

I was thinking of implementing a base 62 number.toString(62) and parseInt(compressed, 62). This would certainly reduce the size of the data but before I go ahead and do this I thought I would put it to the folks here as I know there must be some outside the box solution I have not considered.

The basic specs are: - Compress large number arrays into strings for JSONP transfer 开发者_高级运维(So I think UTF is out) - Be relatively fast, look I'm not expecting same performance as I have now but I also don't want gzip compression either.

Any ideas would be greatly appreciated.

Thanks

Guido Tapia


Another way of doing this might be to encode to binary types such as signed/unsigned ints, and manually decode as at http://snippets.dzone.com/posts/show/685 which would require server side code to create the binary data.

You could then huffman compression or something similar like RLE (see http://rosettacode.org/wiki/Run-length_encoding#JavaScript for an implementation, though it may have some issues in IE without modifying) to compress the data further.

EDIT: Alternatively, you could convert the numbers themselves to a base (radix) in the unencoded URI character range (see http://en.wikipedia.org/wiki/Percent-encoding) which should work well if many of the numbers are larger than 2 digits. I converted the code at http://code.activestate.com/recipes/111286-numeric-base-converter-that-accepts-arbitrary-digi/ from python to do this.

It currently doesn't handle floats, but it could be done fairly easily:

function get_map(s) {
    d = {}
    for (var i=0; i<s.length; i++) {
        d[s.charAt(i)] = i}
    d.length = s.length
    d._s = s
    return d}

var separate_with = '~';
var encodable = get_map('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.'); // - is reserved for negatives obviously :-P
var base10 = get_map('0123456789')

// UNCOMMENT ME for length/speed testing in a wider base!
// You may wish to experiment with the ranges for a happy medium between bandwidth and DB space :-P
/*var encodable = ''
for (var i=1; i<128; i++) {
    encodable += String.fromCharCode(i)
}
encodable = get_map(encodable)*/

function baseconvert(number, fromdigits, todigits) {
    var number = String(number)

    if (number.charAt(0) == '-') {
        number = number.slice(1, number.length)
        neg=1}
    else {
        neg=0}

    // make an integer out of the number
    var x = 0
    for (var i=0; i<number.length; i++) {
        var digit = number.charAt(i)
        x = x*fromdigits.length + fromdigits[digit]
    }

    // create the result in base 'todigits.length'
    res = ""
    while (x>0) {
        remainder = x % todigits.length
        res = todigits._s.charAt(remainder) + res
        x = parseInt(x/todigits.length)
    }

    if (neg) res = "-"+res
    return res
}

function encodeNums(L) {
    var r = []
    for (var i=0; i<L.length; i++) {
         r.push(baseconvert(L[i], base10, encodable))
    }
    return r.join(separate_with)
}

function decodeNums(s) {
    var r = []
    var s = s.split(separate_with)
    for (var i=0; i<s.length; i++) {
         r.push(parseInt(baseconvert(s[i], encodable, base10)))
    }
    return r
}

var test = [5, 654645, 24324, 652124, 65, 65289543, 65278432, 643175874158, 652754327543]
alert(encodeNums(test))
alert(decodeNums(encodeNums(test)))


Options

  • Use a js library (see answer from Josh) but watch for script timeouts
  • Use some kind of activex or silverlight uploader that also does zip
  • Use a java plug-in
  • Use a flash based compressing uploader


I'm now toying with the idea of encoding the length of the number into the number itself. I still have not perfected this algorithm but will post it once done. But roughly this is what I am currently trying to achieve:

Boundaries:

  • Loss of precision is allowed (+- 3)
  • Max number will be 3200
  • Min is 0 (so no negatives)
  • No decimal - floats

So now given my max allowed number I know that the length of the encoded digit in base 62 will have a max length of 2. So any encoded number is either 1 or 2 characters in length. Awesome. So now I'm going to make the number odd or even depending if its 1 or 2 characters (remember I can handle loss of precission). This removes the need for separators.

Now I'm seeing about 70%-80% compression with this, its very buggy at the moment but I'm excited about it, so the post to encourage discussion around this methodology.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜