Optimization SQL Query For Analytics
I have implemented analytics 开发者_运维技巧system which is now performing very poorly. To explain it I need to explain table structure queries
I have two innodb tables
Table1: Contains records about hourly stats (stats_id, file_id, time) Table2: Contains over 8 million rows.
Table 2 structure is
full_stats (
stats_id Int
file_id Int
stats_week Int
stats_month Int
stats_year Int
stats_time DATETIME
)
What I am trying to do is to calculate the total views from hourly_stats for a given period of time and grouping records by file_id and then I add/update records to full_stats table. On avg it takes 1-2 mins to process one row. I am trying to optimize the queries for better performance.
Here is what I am doing
There are 60% chances that file_id already exists in full_stats for a given week, month and year and 40% chances are that it doesn't exist.
so in the first query I try to update record using following the query
UPDATE full_stats
SET total_views=XXX
WHERE stats_week=XX stats_month=X
AND stats_year=YYYY
after that I check if affected rows is zero then I insert the record. Once insert or update is done then the record from hourly_stats is removed based on file_id and the given period of time.
Can you give me any suggestion how to optimize queries and reduce the lock rate?
An index causes poor performance, when the index has to be rewritten or updated after every insert/update. This is more likely with regular indexes.
However, in your case it sounds like you'd need an unique index anyway. With this you might not have this problem (that much).
Make sure, that your table uses the InnoDB engine and have an unique index on (stats_year, stats_month, stats_week)
.
Then, instead of doing an update first, then checking for affected rows and inserting if necessary, use INSERT...ON DUPLICATE KEY UPDATE
. This way in 40% of the cases you spared yourself the preceeding update statement.
Note though, that the unique index is crucial for this statement!
精彩评论