How to implement twitter's 'friends' timeline' function

2023-01-09 17:07 问答作者：

I'm trying to learn database design by creating a twitter clone.. And I was wondering what's the most efficient way of creating the friends' timeline function. I am implementing this in Google App Engine, which uses Big Table to store the data. IIRC, this means very fast read speed(gets), but considerably slower page queries, and this also means considerably slower write speeds. Currently in my mind there are two methods, each with its setbacks:

For each user, there's a list structure that's their frie开发者_如何学JAVAnds' timeline. Everytime someone makes a tweet, this structure gets updated for each of its followers. This method uses a lot of write operations, but for each user retrieving the list it will seem very fast.

For each user, calculate the friends' timeline dynamically by getting all the tweets of the people he's following, and do a merge of all the tweets to get a friends' timeline(since for each individual person the tweets are sorted chronologically). This might be slow if the person is following a lot of people.

Are there some other ways that I'm not aware of? Both of these methods seem like it will make the system choke up when the number of users increase.

You need to focus on the object of the exercise, which you say is learning about database design. So don't get hung up on scalability. Design a database which works for you and your mates to use. Pretty much any design you pick will be able to handle that sort of load. Apart from anything else, the GAE license would start to charge you big bucks if you even started to approach Twitter-style levels of hits.

The thing is, scalability for players like Twitter and Facebook is a major part of their proposition. Consequently they expend a lot of effort in building their apps to scale. They do this with lots of optimizations, including different storage architectures for different types of data, distributed servers, and caching, lots of caching. In other words, it's done with infrastructure and architecture, not database design

High Scalability is a very good source of relevant information. For instance, this summary of a presentation by Twitter's Evan Weaver last year is extremely pertinent:

"[E]verything in RAM now, database is a backup; peaks at 300 tweets/second; every tweet followed by average 126 people; vector cache of tweet IDs; row cache; fragment cache; page cache; keep separate caches; GC makes Ruby optimization resistant so went with Scala; Thrift and HTTP are used internally; 100s internal requests for every external request; rewrote MQ but kept interface the same; 3 queues are used to load balance requests; extensive A/B testing for backwards capability; switched to C memcached client for speed; optimize critical path; faster to get the cached results from the network memory than recompute them locally."

Hmmm, "Database is a back-up" only. Scary stuff (for a database guy like me).

继续阅读：algorithm database-design optimization twitter

How to implement twitter's 'friends' timeline' function

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？