How Do I aggregate Data By Day and Still Respect Timezone?
We are currently using a summary table that aggregates information for our users on an hourly basis in UTC time. The problem we are having is that this table is becoming too large and slowing our system down im开发者_运维知识库mensely. We have done all the tuning techniques recommended for PostgreSQL and we are still experiencing slowness.
Our idea was to start aggregating by day rather than by hour, but the problem is that we allow our customers to change the timezone, which recalculates the data for that day.
Does anyone know of a way to store the daily summary but still respect the numbers and totals when they switch timezones?
Summarise the data in tables with a timeoffset column, and a "day" field (a date) that is the day for that particular summary line. Index on (timeoffset, day, other relevant fields), clustered if possible (presumably PostgresSQL has clustered indexes?) and all should be well.
I'm assuming you've went through all the partitioning considerations, such as partitioning by user.
I can see several solutions to your problem, depending on the usage pattern.
Aggregate data per day, per user selection. In the event of timezone change, programatically recalculate the aggregate for this partner. This is plausible if timezone changes are infrequent and if a certain delay in data may be introduced when a user changes timezones.
If you have relatively few measures, you may maintain 24 columns for each measure - each describing the daily aggregate for the measure in a different timezone.
If timezone changes are frequent and there are numerous measures, it seems like 24 different aggregate tables would be the way to go.
I met this problem too. I take this solution: the data with date type use local timezone, the other data with datetime type use UTC timezone, because the statistics index is local. Another reason is now we have only local data.
I facing same problem. I’m thinking about aggregating by date and time (hour by hour in UTC). Then you could fetch data accordingly for whatever time zone you want. Unfortunately this won’t work if you need to support timezones where there is 45/30/15 minute offset. Then you could aggregate data by 15minutes. Solution depends on the amount of data to agregate.
精彩评论