Apache Cassandra Data Schema for Twitter Streaming API

2023-03-29 08:33 问答作者：

I am aware of Twissandra which is an example twitter clone usin开发者_C百科g Cassandra but I was interested to see if anyone has shared a Cassandra schema not to clone Twitter but to use for storing tweets coming through Twitter Streaming API?

It very much depends what sort of queries you want to do with the data after you have ingested it - I see from your previous question "Dumping Twitter Streaming API tweets..." you probably just want to do big batch processing on it.

If this is the case, you just need to worry about load balancing, making sure each node in the cluster handles 1/n of the write load, and contains 1/n of the data - using the random partition and inserting one row per tweets with the status id as the row key will achieve this.

However, if you want to do queries like "give me all tweets for a given user" you will need a slightly more complicated schema, as the schema suggested above will require you to scan all the data. You could insert multiple tweets per row, the row key being the userid, the column key being the tweet id and the value being the tweet. Then you could use get_slice to answer that query.

A good (somewhat related) blog post: http://blog.insidesystems.net/basic-time-series-with-cassandra

继续阅读：cassandra streaming twitter

Apache Cassandra Data Schema for Twitter Streaming API

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？