开发者

Super column vs serialization vs 2 lookups in Cassandra

We have:

users, each of which has events, each of which has several properties (time, type etc.). Our basic use case is to fetch all events of a given user in a given time-span.

We've been considering the following alternatives in Cassandra for the Events column-family. All alternatives share: key=user_id (UUID), column_name = event_time

  1. column_value = serialized object of event properties. Will need to read/write all the properties every time (not a problem), but might also be difficult to debug (can't use Cassandra command-line client easily)

  2. column is actually a super column, sub-columns are 开发者_如何学Goseparate properties. Means reading all events(?) every time (possible, though sub-optimal). Any other cons?

  3. column_value is a row-key to another CF, where the event properties are stored. Means maintaining two tables -> complicates calls + reads/writes are slower(?).

Anything we're missing? Any standard best-practice here?


Alternative 1 : Why go to Cassandra if you are to store serialized object ? MongoDB or a similar product would perform better on this task if I get it wright (never actually tried a document base NoSQL, so correct me if I'm wrong on this one). Anyway, I tried this alternative once in MySQL 6 years ago and it is still painful to maintain today.

Alternative 2 : Sorry, I didn't had to play with super colunm yet. Would use this only if I had to show frequently many information on many users (i.e. much more than just their username and a few qualifiers) and their respective events in one query. Also could make query based on a given time-span a bit tricky if there are conditions on the user itself too, since a user's row is likely to have event's columns that fit in the span an other columns that doesn't.

Alternative 3 : Would defenitly be my choice in most cases. You are not likely to write events and create a user in the same transaction, so no worry for consistency. Use the username itself as a standard event column (don't forget to index it) so your calls will be pretty fast. More on this type of data model at http://www.datastax.com/docs/0.8/ddl/index. Yes it's a two call read, but it do is two different families of data anyway.

As for a best-practices, the field is kinda new, not sure there are any widely approved yet.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜