开发者

MongoDB schema design: songs/plays/likes

This is my first project using a NoSQL database and I am wondering how to structure my data in the most efficient way.

I am making a small service which stores all songs played by a radio station. Users can开发者_如何学C "like" the songs. So, basically I have the following data:

Song: Id, Artist, Title
Play: SongId, Time (when was the song played)
Like: SongId, UserName, Time (when did the user click the like button)

I need to run various queries on that data. For example: last X songs played + like count, top played songs in the last X days, who liked a specific song etc.

First I was thinking about storing everything in a single document with nested play and like information. But this makes some of the queries quite complicated and requires me to do stuff like sorting on the client side but I would like to keep the amount of data transfered from the database small.

I was also thinking about caching some of the most used queries in memory. Are there any general recommendations when doing something like that?


Since the number of songs, number of plays, number of users are all going to be large, using embedded documents for any of them isn't going to work so you are forced somewhat to go back to a more relational model and have a collection for each.

What I would denormalize is to put the song information into the Play document and into the Like document so you can render the playlist and the likes for any user without having to do any 'joins' back to the Song collection.

Song: Id, Artist, Title
Play: SongId, Time (when was the song played), Artist, Title
Like: SongId, UserName, Time (when did the user click the like button), Artist, Title, UserId


If the queries are getting hard using the nested structure, you have a few options:

  1. Create separate documents for the song proper vs plays vs likes. The documents can refer to eachother, of course.

  2. Keep it nested, but also have the client insert additional stub documents which create the inverse relationship. That is, denormalize the data. In distributed datastores like Mongo, denormalizating your data is more acceptable than in the relational DB world.

  3. Use mapreduce queries to aggregate the data you want. This may become expensive as the data grows though, so you might be better off with another document datastore like CouchDB, which would be able to continuously run your mappers as new data comes in. (I don't know if Mongo has this capability yet or not).

  4. Use a SQL database. Your data are fairly normal, in that the characteristics of a song, play or like won't change from record to record. So using a relational database here would give you the query flexibility you're after, without sacrificing data flexibility (since you don't really need it). RDBMS's don't scale horizontally, but that's a performance concern you could solve later once you're uber successful.

That's the way I see it - good luck!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜