Strategies for rarely updated data
Background: 2 minutes before every hour, the server stops access to the site returning a busy screen while it processes data received in the previous hour. This can last less than two minutes, in which case it sleeps until the two minutes is up. If it lasts longer than two minutes it runs as long as it needs to then returns. The block is contained in a its own table with one field and one value in that field.
Currently the user is only informed of the block when (s)he tries to perform an action (click a link, send a form etc). I was planning to update the code to bring down a lightbox and the blocking message via BlockUI jquery plugin automatically.
There are basically 2 methods I can see to achieve my aim:
Polling every N seconds (via PeriodicalUpdater or similar)
Long polling (Comet)
You can reduce server load for 1 by checking the local time and when it gets close to 开发者_开发问答the actual time start the polling loop. This can be more accurate by sending the local time to the server returning the difference mod 60. Still has 100+ people querying the server which causes an additional hit on the db.
Option 2 is the more attractive choice. This removes the repeated hit on the webserver, but doesn't allieve the repeated check on the db. However 2 is not the choice for apache 2.0 runners like us, and even though we own our server, none of us are web admins and don't want to break it - people pay real money to play so if it isn't broke don't fix it (hence why were are running PHP4/MySQL3 still).
Because of the problems with option 2 we are back with option 1 - sub-optimal.
So my question is really two-fold:
Are there any other possibilities I've missed?
Is long polling really such a problem at this size? I understand it doesn't scale, but I am more concerned at what level does it starve Apache of threads. Also are there any options you can adjust in Apache so it scales slightly further?
Can you just send to the page how many time is left before the server starts processing data received in the previous hour. Lets say that when sending the HTML you record that after 1 min the server will start processing. And create a JS that will trigger after that 1 min and will show the lightbox.
The alternative I see is to get it done faster, so there is less downtime from the users perspective. To do that I would use a distributed system to do the actual data processing behind the hourly update, such as Hadoop. Then use whichever method is most appropriate for that short downtime to update the page.
精彩评论