开发者

When is it OK to store derived data in a database?

I currently have a GAME table with two fiel开发者_Python百科ds

user_id, win

win = 1 for win, 0 for loss

Suppose I want to display winning percentage. This is fairly trivial with a count operation. However suppose I want to display thousands of users on the same page with the winning percentage for each. I have some scalability concerns about this scenario. Is it too much of a hack to create a separate cache table with the following fields

user_id, win_percentage

This would be updated every time a new game was posted. Now the winning percentage can be determined very quickly instead of using thousands of count operations. What's the best way to handle this problem?


Data warehouse folks say that it's always appropriate to store derived data in the database. As long as it isn't updated.

The question is one of updates.

First. Your scalability concerns don't amount to much. "suppose I want to display thousands of users on the same page with the winning percentage for each" doesn't matter much. This can be computed very, very quickly.

This would be updated every time a new game was posted.

That's the problem with storing derived data. The cost of the update may actually outweigh the cost of computation. You don't know without actual usage statistics.

Thus.

Don't store derived data until you can prove (with actual measurements) that it's more efficient to store it.


When the derived data is expensive to calculate and is relatively static ( it doesn't change very frequently or at all ) you should consider Warehousing it in a different Database ( doesn't have to be the same type of database or a database at all, it could be something like memcached ) on a different machine so that it doesn't impact the performance of your transactional database.

If it isn't a performance issue ( as in not expensive to calculate ), then don't bother with the added complexity, caching is very hard to get right.

You have measured and determined it is a problem and don't just think it might be a problem right?

Remember and I paraphrase:

premature optimization without profiling is the root of all evil!

A change in data structure might be a better solution.

user_id, wins, loses, percentage

updating a single record per player will not cost any more, and probably less depending on the Database than a row for every game result and the resulting calculations.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜