Is it better to use database polling or events for the following system?
I'm working on an ordering system that works exactly the way Netflix's service works (see end of this question if you're not familiar with Netflix). I have two approaches and I am unsure which approach is the right one; one relies on database polling and the other is event driven.
The following two approaches assume this simplified schema:
member(id, planId)
plan(id, moviesPerMonthLimit, moviesAtHomeLimit)
wishlist(memberId, movieId, rank, shippedOn, returnedOn)
Polling: I would run the following count queries in wishlist
- Count movies shippedThisMonth (where shippedOn IS NOT NULL @memberId)
- Count moviesAtHome (where shippedOn IS NOT NULL, and returnedOn IS NULL @memberId)
- Count moviesInList (@memberId)
The following function will determine how many movies to ship:
moviesToShip = Min(moviesPerMonthLimit - shippedThisMonth, moviesAtHomeLimit - moviesAtHome, moviesInList)
I will loop through each member, run the counts, and loop through their list as many times as moviesToShip. Seems like a pain in the neck, but it works.
Event Driven: This approach involves adding an extra column "queuedForShipping" and marking it to 0,1 every time an event takes place. I will do the following counts:
- Count movies shippedThisMonth (where shippedOn IS NOT NULL @memberId)
- Count moviesAtHome (where shippedOn IS NOT NULL, and returnedOn IS NULL @memberId)
- Count moviesQueuedFor开发者_JS百科Shipping (where queuedForShipping = 1, @memberId)
Instead of using min, I have to use the following if statements
If moviesPerMonthLimit > (shippedThisMonth + moviesQueuedForShipping)
AND IF moviesAtHomeLimit > (moviesAtHome + moviesQueuedForShipping))
If both conditions are true, I will select a row from wishlist where queuedForShippinh = 0, and set it's queuedForShipping to 1. I will run this function every time someone adds, deletes, reorders their list. When it's time to ship, I would select @memberId where queuedForShipping = 1. I would also run this when updating shippedAt and returnedAt.
Approach one is simple. It also allows members to mess around with their ranks until someone decides to run the polling. That way what to ship is always decided by rank. But ppl keep telling polling is bad.
The event driven approach is self-sustaining, but it seems like a waste of time to ping the database with all those counts every time a person changes their list. I would also have to write to the column queuedForShipment. It also means when a member re-ranks their list and they have pending shipments (shippedAt IS NULL, queuedForShipping = 1) I would have to update those rows and set queuedForShipping back to 1 based on the new ranks. (What if someone added 5 movies, and then suddenly went to change the order? Well, queuedForShipment would already be set to 1 on the first two movies he or she added)
Can someone please give me their opinion on the best approach here and the cons/advantages of polling versus event driven?
Netflix is a monthly subscription service where you create a movie list, and your movies are shipped to you based on your service plan limits.
Based on what you described, there's no reason to keep the data "ready to use" (event) when you can create it very easily when needed (poll).
Reasons to cache it:
- If you needed to display the next item to the user.
- If the detailed data was being removed due to some retention policy.
- If the polling queries were too slow.
精彩评论