Google App Engine Datastore - Is this method fast enough? ( for 500k users )
Let's say we have:
class User(db.Model):
nickname = db.StringProperty()
and we have 500k entities in User, each with a unique nickname.
and I now want to add one more entity, and it must be a unique nickname. So I run this:
to_check = User.gql("WHERE nickname = :1",new_nickname).ge开发者_开发百科t()
if to_check is None:
# proceed to create entity
User(nickname=new_nickname).put()
is this method going to work for over 500k users? Am I going to experience slow processing times?
what are the optimization methods for this?
PS: is indexing the nickname property a good way to proceed?
I can only think of this at the moment:
class User(db.Model):
nickname = db.StringProperty(indexed=True) # index this property
EDITED: btw, I have two unique properties I want to maintain: userid and nickname. The userid will be automatically assigned as the keyname ( I'm making a facebook app which takes the user's facebook id and creates a user entity)
So to me, userid is more important so I'll use it as the keyname.
The nickname will be manually entered by the facebook user, so I need a mechanism to check whether it is unique or not.
so the problem now is, what do I do with the nickname? I can't have two keynames :(
You should check out Brett Slatkin's Google I/O video:
http://code.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html
Specifically, the bit about Relation Index Entities. He deals with a problem very similar to yours.
You could create another entity, that stores the users nickname (and set it as the key_name). When you create it, set the parent to be the User entity:
UserNickname(
parent=user,
key_name=nickname,
nickname=nickname
)
Now you can query the Nickname (get_by_key_name) very quickly, and if you want to exclude the current user (which you will if you let a user change their nickname), you can easily get the parent from a keys_only query or use the ancestory in the query directly.
Edit: Just noticed Kris Walker already suggested this in a comment. You could use a reference property or parent to link the two together, both should work well.
The nickname property will be in your index.yaml "naturally" as soon as you run such queries in your SDK, so don't worry about it too much. The indexed
property defaults to True
(it's normally only used to set it explicitly to False
instead).
With the index, searching for a nickname that may occur 0 or 1 times is going to be quite fast anyway, no matter how many entries in the table -- say, order of magnitude, 50-100 milliseconds; putting a new entity, maybe twice as long. The whole thing should fit within 300 milliseconds or less.
One worry is a race condition -- what if two separate sessions are trying to register exactly the same nickname at exactly the same time? May be unlikely, but when it happens you have no defense as your code stands. Getting such a defense (by running in a transaction) implies a transaction lock and therefore may impact performance (if several such sessions are running at exactly the same time, they'll be serialized).
get_by_key_name will be your new best friend.
I frequently use a code pattern like the following:
user = User.get_by_key_name(user_key_name)
if not user:
user = User(key_name = user_key_name)
This tends to be much faster than a GQL query.
If you are going to be writing more than one entity to the datastore at a time, you should also use the pattern of db.put(entities_list) where the list can contain up to 500 entities of any kind - they don't even have to be the same model kind.
It looks like you are treating the nickname as a unique key for the User entity kind.
So I would do this instead(this has already been stated I see)
class User(db.Model):
# other properties go here, but not nickname
# put a new user
if User.get_by_key_name(user_nick) is None:
User(key_name=user_nick).put()
The indexing strategy is a waste, even with "just" 500k.
There is also db.Model.get_or_insert()
http://code.google.com/appengine/docs/python/datastore/modelclass.html#Model_get_or_insert
hey I just thought of another method to solve my dilemma!
basically when the user manually enters a nickname, I auto append his/her userid to it to make it unique.
eg:
user_nickname is thomas. I append userid to it, becoming thomas_8937459874 ( unique!)
so I don't need to check if the nickname previously exists. Saves me a GQL query.
when the time comes to display the nickname, I'll just use string manipulation to retrieve only the name "thomas"
what do you guys think?
so i tried using ReferenceProperty to do this:
tell me what you guys think:
Additional feature added: User can only change nickname 3 times max
# models.py
# key_name will be whatever the user manually enters to be the nickname
class UserNickname(db.Model):
name = db.StringProperty()
# key_name = facebook id
class User(db.Model):
nickname = db.ReferenceProperty(UserNickname)
nickname_change_count = db.IntegerProperty(default=0)
# create unique entity with facebook id
User(key_name="123456789").put()
***** the following code lies in the signup page *****
# in the signup page , signup.py
# userid of 123456789 is taken from cached session
user = User.get_by_key_name("123456789")
# this is the nickname manually entered by the user
manually_entered_nick = "Superman"
to_check = UserNickname.get_by_key_name(manually_entered_nick)
if to_check is None:
#create usernickname entity
key = UserNickname(key_name=manually_entered_nick,name=manually_entered_nick).put()
#assign this key to the user entity
user.nickname = key
db.put(user)
print 'Unique nickname registered'
else:
print 'Choose another nick pls'
***** the following code lies in the "change user nickname" page *****
# change_nickname.py
# userid is taken from cached session
user = User.get_by_key_name("123456789")
# max no. of nickname changes allowed is 3 ( hardcoded )
# checks if user can change nick
if user.nickname_change_count >= 3:
print 'you cannot change nicks anymore. contact admin'
else:
# delete entire nickname entity
to_delete = UserNickname.get_by_key_name(user.nickname.key().name())
db.delete(to_delete)
# adds to count
user.nickname_change_count += 1
# for security purposes, user account is "disabled" until he/she chooses a new nick.
# user manually enters new nickname
new_nick = "Batman"
to_check = UserNickname.get_by_key_name(new_nick)
if to_check is None:
#create usernickname entity
key = UserNickname(key_name=new_nick,name=new_nick).put()
#assign this nick to user entity
user.nickname = key
db.put(user)
print 'new Nick registered'
else:
print 'Choose another nick pls'
精彩评论