开发者

GAE datastore - best practice when there are more writes than reads

I'm trying to do some practicing with the GAE datastore to get a feeling about the queries and billings mechanisms.

I've read the Oreilly book about the GAE, and watched the Google videos about the datastore. My problem is that the best practice methods are usually concerning more reads than writes to the datastore.

I Built a super simple app:

  • there are two webpages - one to choose links, and one view chosen links
  • every user can choose to add url links to his "links feed"
  • the user can choose as many links as he wants, whenever he wants.
  • on a different webpage, I want to show the user the most recent 10 links he chose.
  • every user has his own "links feed" webpage.
  • on every "link" I want to save and show some metadata - for example: the url link itself; when it was chosen; how many times it appeared on the feed already; etc.

In this case, since the user can choose as many links he wants, whenever he wants, my app write to the datastore, much more than the number of reads (write - when the user chose another link; read - when the user opens the webpage to see his "links feed")

Question 1: I can think of (at least) two options how to handle the data for this app:

Option A: - maintain entity per user with the user details, registration, etc - maintain another entity per user that holds his recent 10 chosen links, which will be rendered to the user's webpage after he asks for it

Option B: - maintain entity per url link - which means all the urls of all users will be stored as the same object - maintain entity per user details (same as in Option A), but add a reference to the user's urls in the big table of the urls

What will be the better method?

Question 2: If I want to count the total numbers of urls chosen till today, or the daily amount of urls the user chose, or any other counting - should I use it with my SDK tools, or should I insert counters in the entities I described above? (I want to reduce the amount of datastore writes as much as I can)

EDIT (to answer @Elad's comment): Assume I want to save only the 10 last urls per users. the rest of them I want to get rid of (so to not overpopulate my DB with unnecessary data).

EDIT 2: after adding the code So I made the try with the following code (trying first Elad's method):

Here's my class:

class UserChannel(db.Model):
currentUser = db.UserProperty()
userCount = db.IntegerProp开发者_如何学Pythonerty(default=0)
currentList = db.StringListProperty() #holds the last 20-30 urls

then I serialized the url & metadata into JSON strings, which the user POSTs from the first page. here's how the POST is dealt:

def post(self):
    user = users.get_current_user()
    if user:  
        logging messages for debugging
        self.response.headers['Content-Type'] = 'text/html'
        #self.response.out.write('<p>the user_id is: %s</p>' % user.user_id())            
        updating the new item that user adds
        current_user = UserChannel.get_by_key_name(user.nickname())
        dataJson = self.request.get('dataJson')
        #self.response.out.write('<p>the dataJson is: %s</p>' % dataJson) 
        current_user.currentPlaylist.append(dataJson)
        sizePlaylist= len(current_user.currentPlaylist)
        self.response.out.write('<p>size of currentplaylist is: %s</p>' % sizePlaylist)
        #whenever the list gets to 30 I cut it to be 20 long
        if sizePlaylist > 30:
            for i in range (0,9):
                current_user.currentPlaylist.pop(i)
        current_user.userCount +=1
        current_user.put()
        Updater().send_update(dataJson) 
    else:
        self.response.headers['Content-Type'] = 'text/html'
        self.response.out.write('user_not_logged_in')

where Updater is my method for updating with Channel-API the webpage with the feed.

Now, it all works, I can see each user has a ListProperty with 20-30 links (when it hits 30, I cut it down to 20 with the pop()), but! the prices are quite high... each POST like the one here takes ~200ms, 121 cpu_ms, cpm_usd= 0.003588. This is very expensive considering all I do is save a string to the list... I think the problem might be that the entity gets big with the big ListProperty?


First, you're right to worry about lots of writes to GAE datastore - my own experience is that they're very expensive compared to reads. For instance, an app of mine that did nothing but insert records in a single model table reached exhausted the free quota with a few 10's of thousands of writes per day. So handling writes efficiently translates directly into your bottom line.

First Question

I wouldn't store links as separate entities. The datastore is not a RDBMS, so standard normalization practices do not necessarily apply. For each User entity, use a ListProperty to store the the most recent URLs along with their metadata (you can serialize everything into a string).

  • This is efficient for writing since you only update a single record - there are no updates to all the link records whenever the user adds links. Keep in mind that to keep a rolling list (FIFO) with references URLs stored as separate models, every new URL means two write actions - an insert of the new URL, and a delete to remove the oldest one.
  • It's also efficient for reading since a single read on the user record gives you all the data you need to render the User's feed.
  • From a storage perspective, the total number of URLs in the world far exceeds your number of users (even if you become the next Facebook), and so does the variance of URLs chosen by your users, so it's likely that the mean URL will have a single user - no real gain in RDBMS-style normalization of the data.

Another optimization idea: if your users usually add several links in a short period you can try to write them in bulk rather than separately. Use memcache to store newly added user URLs, and the Task Queue to periodically write that transient data to the persistent datastore. I'm not sure what's the resource cost of using Tasks though - you'll have to check. Here's a good article to read on the subject.

Second Question

Use counters. Just keep in mind that they aren't trivial in a distributed environment, so read up - there are many GAE articles, recipes and blog posts on the subject - just google appengine counters. Here too, using memcache should be a good option in order to reduce the total number datastore writes.


Answer 1

Store Links as separate entities. Also store an entity per user with a ListProperty having keys to the most recent 20 links. As user chooses more links you just update the ListProperty of keys. ListProperty maintains order so you dont need to worry about the chronological orders of links chosen as long as you follow a FIFO insertion order.

When you want to show the user's chosen links (page 2) you can do one get(keys) to fetch all the user's links in one call.

Answer 2

Definitely keep counters, as the number of entities grows, the complexity of counting records will continue to increase but with counters, the performance will remain the same.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜