Choose between many-to-many techniques in App Engine DB
There is going to be "articles" and "tags" in my App Engine application.
And there are two techniques to implement that (thanks to Nick Johnson's article):
# one entity just refers others
class Article(db.Model):
tags = db.ListProperty(Tag)
# via separate "join" table
class ArticlesAndTags(db.Model):
article = db.ReferenceProperty(Article)
tag = db.ReferenceProperty(Tag)
W开发者_StackOverflow社区hich one should I prefer according to the following tasks?
- Create tag cloud (frequently),
- Select articles by a tag (rather rarely)
Because of the lack of a 'reduce' feature in appengine's map reduce (nor an SQL group by like query), tag clouds are tricky to implement efficiently because you need to count all tags you have manually. Which ever implementation you go with, what I would suggest for the tag cloud is to have a separate model TagCounter that keeps track of how many tags you have. Otherwise the tag query could get expensive if you have a lot of them.
class TagCounter:
tag = db.ReferenceProperty(Tag)
counter = db.IntegerProperty(default=0)
Every time you choose to update your tags on an article, make sure you increment and decrement from this table accordingly.
As for selecting articles by a tag, the first implementation is sufficient (the second is overly complex imo).
class Article(db.Model):
tags = db.ListProperty(Tag)
@staticmethod
def select_by_tag(tag):
return Article.all().filter("tags", tag).run()
I have created a huge tag cloud * on GAEcupboard opting for the first solution:
class Post(db.Model):
title = db.StringProperty(required = True)
tags = db.ListProperty(str, required = True)
The tag class has a counter property that is updated each time a new post is created/updated/deleted.
class Tag(db.Model):
name = db.StringProperty(required = True)
counter = db.IntegerProperty(required = True)
last_modified = db.DateTimeProperty(required = True, auto_now = True)
Having the tags organized in a ListProperty it's quite easy to offer a drill-down feature that allows user to compose different tags to search for the desired articles:
Example: http://www.gaecupboard.com/tag/python/web-frameworks
The search is done using:
posts = Post.all()
posts.filter('tags', 'python').filter('tags', 'web-frameworks')
posts.fetch()
that does not need any custom index at all.
ok, it's too huge, I know :)
Creating a tag-cloud in app-engine is really difficult because the datastore doesn't support the GROUP BY
construct normally used to express that; Nor does it supply a way to order by the length of a list property.
One of the key insights is that you have to show a tag cloud frequently, but you don't have to create one except when there are new articles, or articles get retagged, since you'll get the same tag-clout in any case; In fact, the tag cloud doesn't change very much for each new article, maybe a tag in the cloud becomes a little larger or a little smaller, but not by much, and not in a way that would affect its usefullness.
This suggests that tag-clouds should be created periodically, cached, and displayed much like static content. You should think about doing that in the Task Queue API.
The other query, listing articles by tag, would be utterly unsupported by the first techinque you've shown; Inverting it, having a Tag model with an articles ListProperty
does support the query, but will suffer from update contention when popular tags have to get added to it at a high rate. The other technique, using an association model, suffers from neither of these concerns, but makes it harder to make the article listing queries convenient.
The way I would deal with this is to start with the ArticlesAndTags model, but add some additional data to the model to have a useful ordering; an article date, article title, whatever makes sense for the particular kind of site you're making. You'll also need a monotonic sequence (say, a timestamp) on this so you know when the tag applied.
The tag cloud query would be supported using a Tag entity that has Only a numeric article count, and also a reference to the same timestamp used in the ArticlesAndTags Model.
A task queue can then query for the 1000 oldest ArticlesAndTags that are newer than oldest Tag, sum the frequencies of each and add it to the counts in the Tags. Tag removals are probably rare enough that they can update the Tag model immediately without too much contention, but if that assumption turns out to be wrong, then delete events should be added to the ArticlesAndTags as well.
You don't seem to have very specific/complex requirements so my opinion is it's likely neither method would show significant benefits, or rather, the pros/cons will depend completely on what you're used to, how you want to structure your code, and how you implement caching and counting mechanisms.
The things that come to mind for me are:
-The ListProperty
method leaves the data models looking more natural.
-The AtriclesAndTags
method will mean you'd have to query for the relationships and then the Articles
(ugh..), instead of doing Article.all().filter('tags =', tag)
.
精彩评论