algorithm advice for finding maximum items within a time period
I have a database schema that is similar to the following:
| User | Event | Date
|--------|---------------|------
| 111 | Walked dog | 2009-10-1
| 222 | Walked dog | 2009-10-2
| 333 | Fed Fish | 2009-10-5
| 222 | Did Laundry | 2009-10-6
| 111 | Fed Fish | 2009-10-7
| 111 | Walked dog | 2009-10-18
| 222 | Walked dog | 2009-10-19
| 111 | Fed Fish | 2009-10-21
I would like to produce a query that returns the maximum number of times a user performs some action within a time period. For example,开发者_开发百科 given a time period of 5 days, what is the maximum number of times user 111 walked the dog?
The most obvious solution would be to start at some zero point and move forward each day, summing up 5 day periods along the way, then taking the maximum total out of all the 5 day windows. the approach seems incredibly costly however.
I would appreciate any suggestions you may have.
EDIT 1:
Thanks for the comments / answers. To respond: - I'm using mySQL v5.0 - There could be any number of events per day (per any time period really) - @Paulo Santos: thanks,but like the comment points out, i need to find the window that produces the most results, the window itself can slide. - @Mark: this looks like an interesting solution, although i recall reading that mySQL does not support backing up or jumping ahead cursors.
- @orbMan: this looks promising. I don't fully understand it yet, but I will give this a try tonight. - @mjv: another promising solution. also looks complicated, but I will give it another lookthanks again!
For you specific request I'd do something like:
SELECT User, Event, Count(*)
FROM Table
WHERE Date between @d1 and @d2
Group by User, Event
Then it will return the number of time each user performed each task within the specified (@d1
and @d2
) time frame.
select top 1 x.Date as StartDate, DATEADD(day, 5, x.Date) as EndDate, COUNT(*) as Count
from Event e
inner join Event x on 1=1
where e.Date between x.Date and DATEADD(day, 5, x.Date)
and e.Event = 'Walked dog'
group by x.Date, DATEADD(day, 5, x.Date)
order by Count desc
Output:
StartDate EndDate Count
---------- ---------- -----------
2009-10-01 2009-10-06 2
Here's an alternative algorithm that is cursor based.
Start with two cursors, begin and end, both pointing at the initial row, and current count = 0, and current maximum = 0.
If DATE_DIFF(end.date, begin.date) is more than 5, advance the begin cursor one row. Subtract one from current count if the old row was 'walked the dog'.
If DATE_DIFF(end.date, begin.date) is not more than 5, advance the end cursor one row. Aadd one to current count if the new row is 'walked the dog'. If the current count is greater than current maximum, set current maximum to current count.
Continue until you have covered all rows in the range.
The following SQL code addresses the problem in a declarative fashion, rather than a purely procedural/algorithmic fashion. Depending on the situation, it is probably more efficient (as compared with getting the [sorted] data from SQL and then running some algorithm, and even compared with server-side, cursor-based solutions.)
The idea is to get the counts of [relevant/filtered] events, per user, per day in a separate table or CTE. and then for each Day+User, to tally the number of events for this day and for the next 4 days, and, finally to select (per user) row with the maximum of these tally.
SELECT User, Date, COUNT(*) AS EventCount
INTO tmpTableByUsrByDay
FROM myTable
-- WHERE Event = some_targeted_event --Optional condition(s)
GROUP BY User, Date, COUNT(*)
SELECT DISTINCT User, Date AS FirstDay,
MAX(FiveFaysEventCount) AS EventCountForThisAndNext4Days.
FROM (
SELECT T1.User, T1.Date, SUM(T2.EventCount) FiveDaysEventCount
FROM tmpTableByUsrByDay T1
JOIN tmpTableByUsrByDay T2 ON T2.Date >= T1.Date
AND T2.Date <= DATEADD(day, 4, T1.Date)
GROUP BY T1.User, T1.Date
)
Notes:
- It uses a temporary table, although a Common Table Expression (CTE) could be used instead depending on the underlying SQL host.
- The particular name/syntax for the DateAdd() function may vary between SQL implementations.
- Also this implies that the "date" field contains "only" a date, i.e either a date or datetime/smalldatetime where the time part is fixed (to say 00:00). If that wasn't the case, i.e. if the database had date and time in the column, this could be fixed at the level of the CTE/temp-table query.
精彩评论