Storing occurrences for reporting
What is the best way to store occurrences of an ev开发者_运维百科ent in a database so you can quickly pull reports on it? ie (total number of occurrences, number of occurrences between date range).
right now I have two database tables, one which holds all individual timestamps of the event - so I can query on a date range, and one which holds a total count so I can quickly pull that number for a tally
Table 1:
Event | Total_Count
------+------------
bar | 1
foo | 3
Table 2:
Event | Timestamp
------+----------
bar | 1/1/2010
foo | 1/1/2010
foo | 1/2/2010
foo | 1/2/2010
Is there a better approach to this problem? I'm thinking of converting Table 2, to hold date tallies, it should be more efficient, since my date range queries are only done on whole dates, not a timestamp (1/1/2010 vs 1/1/2010 00:01:12) ie:
Updated Table 2
Event | Date | Total_Count
------+----------+------------
bar | 1/1/2010 | 1
foo | 1/1/2010 | 1
foo | 1/2/2010 | 2
Perhaps theres an even smarter way to tackle this problem? any ideas?
Your approach seems good. I see table 2 more as a detail table, while table 1 as a summary table. For the most part, you would be doing inserts only to table 2, and inserts and updates on table 1.
The updated table 2 may not give you much additional benefit. However, you should consider it if aggregations by day is most important to you.
You may consider adding more attributes (columns) to the tables. For example, you could add a first_date, and last date to table 1.
I would just have the one table with the timestamp of your event(s). Then your reporting is simply setting up your where
clause correctly...
Or am I missing something in your question?
Seems like you don't really have any requirements:
Changing from timestamp to just the date portion is a big deal. You don't ever want to do a time-of-day analysis? like what's the best time of day to do maintenance if that stops "foo" from happening.
And you're not worried about size? You say you have millions of records (like that's a lot) and then you extend every single row by an extra column. One column isn't a lot until the row count skyrockets and then you really have to think about each column.
So to get the sum of event for the last 3 days you'd rather do this
SELECT SUM(totcnt) FROM (
SELECT MAX(Total_count) as totcnt from table where date = today and event = 'Foo'
UNION ALL
SELECT MAX(Total_count) from table where date = today-1 and event = 'Foo'
UNION ALL
SELECT MAX(Total_count) from table where date = today-2 and event = 'Foo'
)
Yeah, that looks much easier than>
SELECT COUNT(*) FROM table WHERE DATE BETWEEN today-2 and today and event = 'foo'
And think about the trigger it would take to add a row... get the max for that day and event and add one... every time you insert?
Not sure what kind of server you have but I summed 1 Million rows in 285ms. So... how many millions will you have and how many times do you need to sum them and is each time for the same date range or completely random?
精彩评论