How to calculate cumulative moving average in Python/SQLAlchemy/Flask
I'll give some context so it makes sense. I'm capturing Customer Ratings for Products in a table (Rating) and want to be able to return a Cumulative Moving Average of the ratings based on time.
A basic example follows taking a rating per day:
02 FEB - Rating: 5 - Cum Avg: 5
03 FEB - Rating: 4 - Cum Avg: (5+4)/2 = 4.5
04 FEB - Rating: 1 - Cum Avg: (5+4+1)/3 = 3.3
05 FEB - Rating: 5 - Cum Avg: (5+4+1+5)/4 = 3.75
Etc...
I'm trying to think of an approach that won't scale horribly.
My current idea is to have a function that is tripped when a row is inserted into the Rating table that work开发者_运维技巧s out the Cum Avg based on the previous row for that product
So the fields would be something like:
TABLE: Rating
| RatingId | DateTime | ProdId | RatingVal | RatingCnt | CumAvg |
But this seems like a fairly dodgy way to store the data.
What would be the (or any) way to accomplish this? If I was to use the 'trigger' of sorts, how do you go about doing that in SQLAlchemy?
Any and all advice appreciated!
I don't know about SQLAlchemy, but I might use an approach like this:
- Store the cumulative average and rating count separately from individual ratings.
- Every time you get a new rating, update the cumulative average and rating count:
- new_count = old_count + 1
- new_average = ((old_average * old_count) + new_rating) / new_count
- Optionally, store a row for each new rating.
Updating the average and rating count could be done with a single SQL statement.
精彩评论