
Google App Engine Cron Job Question - Mass updating

I have a User model for a game system. Need to increase the points by 100 every hour.

# the key_name is the userid in this case
class User(db.Model):
  points = db.IntegerProperty(default=0)

so should prepare a handler which does a GQL query across all entities? ( wouldn't that be a little slow with 500k - 1 million user entities? )


users = User.all() # if i'm not mistaken, only 1000 queries can be done.
for user in users:
  user.points += 100

i suppose using taskqueues, and sharding counters to overcome the 1000 limiy, I could pull it off


but then again, why don't I just take the time difference of when the user last logged in, and if it's N number of hours, I'll award the user N * 100 points? that should reduce the load on my application.

eg: class User(db.Model): lastlogin = db.DateTimeProperty() points = db.IntegerProperty(default=0)

what do you guys think?

but then again, why don't I just take the time difference of when the user last logged in, and if it's N number of hours, I'll award the user N * 100 points? that should reduce the load on my application.

Yes, that is a much more efficient approach. That way, you only update the points once per user login, instead of updating every user record every hour, which would be very expensive.

Two thoughts on this:

  • Don't worry about 500K - 1M user entries. I don't know you or your game but I'd be very surprised if you get more than 1K.

  • If there's an algorithmic way to allocate the points once rather than every hour, this will be MUCH preferable. Definitely do that, then. Question arises: Are these point increments also accrued while the user is online? If so, you need to build in a check on every action. On the other hand, if you're doing this anyway, then you don't need a check at login time.

Paging through large datasets discusses techniques for doing things like this - it's written in the context of displaying X items per page on a form, but the concepts are the same.

You can further split up the work by putting the actual updates in a deferred task.

However, as you've suggested, it's probably more efficient to only calculate this on demand.





验证码 换一张
取 消

