开发者

Lightweight data format

As known, JSON is lighter data format, than XML and it is more preferable to use. But when you transfer big arrays of objects with the same structure, JSON is overload with data too. For example:

[
    {
        name: 'John',
        surname: 'Smith',
        info: { age: 25, comments: '' }
    },
    {
        name: 'Sam',
        surname: 'Black',
        info: { age: 27, comments: '' }
    },
    {
        name: 'Tom',
        surname: 'Lewis',
        info: { age: 21, comments: '' }
    }
]

name, surname, age and commen开发者_如何学Gots triple declaration is useless, if I exactly know, that every array object has the same structure.

Is there any data format, that can minify such array data and be flexible enough?


Admittedly, this is a hackish solution, but we've used it and it works. You can flatten everything into arrays. For example, the above would be represented as:

[
    ['John','Smith',[24,'']],
    ['Sam','Black',[27,'']],
    ['Tom','Lewis',[21,'']]
]

The downside is that on serializing/deserializing, you have to do some custom logic. However, this does result in additional savings for a text-based solution, and Ray is right -- if you really want maximal savings, binary is the way to go.


Well if you have text formats, YAML tries to have minimal markup. It gets rid of the semicolons and braces pretty much. But text compresses pretty well.

But if you want to remove redundancies in property names, you have to go with a binary format. Look into MessagePack, Protocol Buffers, or Avro. I don't know of any text-based formats that remove this kind of redundancy.

LATE ADDITION:

Oh my, after using Hadoop to process dozens of gigabytes at a shot for the past year, how could I have forgotten CSV? Geez. The first row can be the schema, and you really don't need quotes. And the separator can be up to you. Something like this:

name|surname|infoage|infocomments
John|Smith|24|
Sam|Black|27|Hi this is a comment
Tom|Lewis|21|This comment has an \| escaped pipe

For small docs it might be smaller than some binary formats, but binary is good for storing real numbers.

Also CSV is really only good when you have a collection of items that are all the same. For complex object hierarchies go with binary, YAML, or @incaren's array-based solution.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜