开发者

80 3 column rows, or 1 81 column rows

I Have some data to store,

I must store this data for every user,

The data is mor开发者_如何学Ce or less

key: 0  
value: 107,

key: 1  
value 213

There are about 80 key/value sets per user

So my question is,

Do I have one row that is basically

user_id key1, key2... key80

or do I have 80 rows of

user_id, key, value


Data I must store:

You can see the actual data i must store here: http://219.89.41.169/ReachApi/?gt=The%20hailwood

I Cant just copy/paste as it is approx 8mb of data, so the page may take a while to load.

But as you can see I am given the data in that format, Hence why I must store it this way.


I'd normally cringe at the thought of 81 columns in a table. However, IF:

  • The number of keys/fields is unlikely to change, AND
  • All of the fields relate directly to the user, and apply to every user, and/or
  • You're likely to need to query for more than one key at a time for a given user,

then it makes more sense to have that many columns than to store each user/key combo as its own row. You get type safety, the ability to constrain and index your data better, and you get to query a user's stats without needing a dozen joins (or getting back a bunch of rows that you have to then assemble into a data structure).

If the number of fields is constantly in flux, or the fields don't apply to everyone (ie: some users would have different numbers of fields), or you never want to know more than user 3225's value for field 53, then a user/key/value table makes more sense. But it'll be a pain in the ass to keep everything correct, updates would take forever (because indexes would need to be redone, and you will need indexes), and queries will get hideous if you ever need more than one or maybe two fields at a time.


You should be using your first approach - one row of

user_id | key | value

This way you're more flexible when you need more / less keys per user.


I took a look at the data. Why do you think about storing the data in a database at all? I suppose you want to read the data, run some algorithm on it, and compute a result, correct? Or is it actually a requirement to store the data permanently or this is even all you are trying to do?

You're lists all seem to have the following format:

[KillsByEnemyTypeClass] => Array
                    (
                        [0] => stdClass Object
                            (
                                [Key] => 0
                                [Value] => 0
                            )
                        ...


[MedalCountsByType] => Array
                    (
                        [0] => stdClass Object
                            (
                                [Key] => 0
                                [Value] => 0
                            )

As the data format implies, the lists are consecutive arrays, not actually key-value pairs. Because the keys are all consecutive, you can store the values in one large array in your programming language of choice.

This is your data format:

struct Data {
    std::string reason;
    int status;

    struct AiStatistics {
         int aDeathsByDamageType[82];
         int nHopperId;
         int aKillsByDamageType[82];
         int nMapId;
         double fMedalChestCompletionPercentage;
         int aMedalCountsByType[128];
         int nTotalMedals;
         int nVariantClass;
         ...
    } aaistatistics[9]; 
}

It looks as if the arrays must have a static size, because the 82 different damage types will probably mean something to you. If there are suddenly 83 damage types, the program that generates this data will have changed and you will have to adapt your algorithms, too. That means the data and your program are not independent and the advantages of using a database are questionable.

Update

Thanks for clarifying your requirements. I understand why you have to cache the data for other clients now.

But: Is the data you linked to all the data you have to store? That means, do you only cache the output from the web API and if the output changes you overwrite the previously stored data? Or is there a temporal dimension and you want to store the sequence of outputs of the API? In both cases, a thin C++ API around the binary data could be much more efficient than a database.

If its just for caching the data, I would still propose to model the database after the object model above. The AI Statistics table has one column per member variable, i.e., one column for the complete DeathsByDamageType array. Store the whole array as one value. Clients cannot lookup a single value using this model, but have to receive the complete array. Unless you have a concrete use case for anything else, I'd stick with this. Your database will be much simpler.

If this is really, really, really not enough for your purposes, your tables would probably be:

table Data { id, reason, status }
table AiStatistics { id, data_id, ..., all integer members like nTotalMedals etc }
table DeathByDamageType { aistat_type, index, value }
table ... for every other array member of AiStatistics

By default, storing DeathByDamageType in this way is really space-inefficient, the table is at least three times larger than the array values because for every value you have to store the AiStatistics reference id and the array index separately.

If you do it this way, at least exploit the sparsity in the arrays and don't store values in DeathByDamageType that are 0. You could not do this with the array.


I would stay away from 81 columns in a database table, especially for things like this, where you are likely to add more "keys" in the future (in which case you would have to add more columns to the table).

"key-value" tables can lead to performance problems, and make it clunky to write queries, so it might be advantageous to see if you can somehow group the keys into related parts, which you could stuff into smaller, more concrete tables.


Pro 80 colums

  • one insert on user creation, and only update statements for modifications
  • comparing the values of two or more keys can be done with simple SELECT statements (without using self joins)
  • O/R mapping just works out of the box
  • the database can handle type checking

Pro 80 rows

  • adding keys is just an insert, no ALTER TABLE required (which implies a long downtime on updates if you have a huge number of users
  • plugins can easily add they own configuration settings (assuming some kind of prefix in the key names to avoid conflicts)

Comments about 80 rows

  • If you use this approach, i suggest to use "key_id" instead of "key" and have a second table for keys so that key names can be validated. That table can have a third column for documentation.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜