Analytics Dashboard Strategy
We are currently developing an API and we want to provide an analytics dashboard for our clients to view metrics about their calls per month/day/hour.
The current strategy we thought is to save each call to a client separate table (eg. calls_{client_id}) for historic reasons and have a summary table (eg. calls_summary) containing the number of calls for a given hour of a day for each client.
Then, each day a cron job will create a xml file with the summary of last day's calls for each client and the dashboard will use them instead of the database. So, the only analytics task which will use the database will be the cr开发者_JS百科on job.
For infrastructure we are thinking MySQL replication and the slave as the analytics database.
Is that strategy useful and valid for real web statistics? Can you propose any tuning on that or even a totally different one?
save each call to a client separate table (eg. calls_{client_id}) for historic reasons
No. Don't break the rules of normalization unless you've got good reason. It won't improve performance and could actually be very detrimental. It'll certainly make your code more complex and therefore less reliable.
It might be worth archiving off old records on period-by-period basis, but unless you know that you'rew going to run into performance problems I'd advise against this.
By all means pre-consolidate the data into another table (provided you are getting a reduction ni number of rows of at least 95%). But don't bother converting it to XML unless and until you need the data in that format.
As for how you pre-consolidate....either use period based consolidations (e.g. roll up by date) or use flagging to record which records have already been consolidated.
The less frequently you run the consolidation the more the impact on performance. But run it too frequently and you'll have problems with contention / locking.
Without knowing a lot about the structure and volume of data or the constraints in terms of budget, availability and timeliness it's hard to provide an optimal solution. But if were me, I'd probably go with 3 mysqld tiers - one providing the transactional write facility, one replicating this data and generating the consolidated data, and one to provide read access to the consolidated data (master <-> master <-> slave)
Performance wise, it is a bad idea to create a separate table for each client. The classic approach for that would be the folowing:
client: id, name, address, ...
call: id, client_id, created_at, duration, ...
calls_summary: id, client_id, date_start, date_end, nb_calls
Now if you want to retrieve all the calls of a client, you go like this:
SELECT * FROM client
LEFT JOIN call ON call.client_id = client.id
WHERE client.id = 42
Or :
SELECT * FROM call where client_id = 42
I don't see any reason for using xml, your cron could be just updating the calls_summary table.
精彩评论