开发者

Most space efficient way to store millions of simple data?

My data looks like this:

00000000001 : `12341234...12341234'

Basically a unique id value associated with a big string of numbers (less than 100 chars).

I want to store 10's of millions and maybe even 100's of millions of these pieces of data, just IDs pointing to big number strings. I am wondering what th开发者_如何学JAVAe most space efficient way to store them is and I also want to keep in mind a quick look up time as well. I want my application to be given a number like 550,000 and be able to quickly reference the big string of numbers associated with it.

I have looked at open source DBs as an option (MySQL) and I also considered something like JSON or XML. Are there other options? What would be best?

The reason I am uncertain is because the data is so simple. I am afraid of using certain databases because some are relational or object oriented, but I don't have a need for those features (there might be overhead here). I am also afraid my data is too simple and repetitive for something like JSON too because I feel like much of the file space will be consumed by repeating "id" : and "bignumber" : over and over.

Any suggestions?


It looks like both id and value are integer values, so storing them as binary data (as opposed to strings) would save a lot of space. This rules out JSON or XML, which are text-based.

I think you want to use a key-value store, such as BerkeleyDB. They allow fast lookup by key (but nothing else).

Using something like SQLite would also have very little overhead and allow for convenient access methods.

It would also be important that you can access the data without reading it completely into memory first (database engines manage that for you, with JSON or a hand-rolled format this can be a lot of work).

If you do not need network access (but want to work on local files), an embedded database system like BerkeleyDB or SQLite seems to be the best fit. Not having a server also greatly reduces the setup overhead.


I think the most efficient way to store this data would be to omit the "id" and just store your big numbers in fixed format. You would need about 42 bytes to store numbers with 100 digits or less and you could easily lookup the number you're after by multiplying "id" by 42 and going straight to the offset where your number is stored.


MySQL or similar will handle a lot of details for you. SQLite might be good too as you don't need that many features.

A integer field and a text field would work, but you can pack more data into a binary blob doing packing and unpacking as necessary. I'd probably encode them two digits to a byte, though you could do better if you want to deal with bit shifts and such.

As @gordy suggests, if all your values have lots of digits, you might do better with a fixed row size for everything as it'll be faster for lookups. Use variable width if size is more important.

If your data is going to be read only, you might try compressing it with MySQL's archive table type.

http://dev.mysql.com/doc/refman/5.1/en/archive-storage-engine.html


Any old database should work fine; form BDB (or more modern variants, Redis, Tokyo Cabinet) to standard sql DBs like MySQL or Postgres. My own favorite for latter is H2, a simple but reasonably performant and nicely embeddable SQL DB.

For basic storage choices would be larger; XML/JSON (often compressed with gzip) is fine, but if you do need id lookups, a database makes more sense.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜