Cassandra/BigTable data model - what's the best approach for building indexes?

2023-01-08 19:08 问答作者：

I'm in the process of spiking a conversion from MySQL to Cassandra for PenWag.com. In Cassandra, I'm storing Users keyed off of a GUID, but users sign in with their email, not the GUID (obviously). GUID as a key for Users makes sense to me more than email for two reasons. From a practical perspective it seems that it's too开发者_Python百科 cumbersome to change or delete/add a row with all of its SuperColumns. From a theoretical standpoint, it's still the same user, why should their key change?

Nevertheless, here's my question: I'm building an index in a separate ColumnFamily, mapping email->GUID to support login. It's a Standard type CF, where the column name is email, and the value is GUID. It's Standard, not Super, to avoid loading an entire SC for every mapping. Supporting "change email" is easy, it's just a column delete/add. But it seems that an alternative to this is to store the index as rows instead of columns, where the row key is email, and a column holds the GUID. Delete/add on those rows would not be cumbersome, since there's only column (the GUID) to manage.

It seems that either approach works. What are the pros and cons of each? Is there a best practice?

Since I have no hands-on experience with Cassandra or similar databases, you'll need to take my answer with a grain of salt :)

If you'd store each mapping as a column, using the email address as the column name, this would imply a single row containing an enormous amount of columns. According to Wikipedia^[1]:

Every operation under a single row key is atomic per replica no matter how many columns are being read or written into.

This could result in significant locking overhead if all mappings are stored in a single row.

The Cassandra Wiki states^[2]:

The row key is what determines what machine data is stored on.

This makes me believe that it's more efficient to do lookups based on row key than on column name. Based on this information, I would suggest to use the email address as the row key and store the GUID in the column.

Niels is correct; one row per user would be the right way to do this manually.

I qualify that because in 0.7 you could just have a an email column in the row with the rest of your keyed-by-UUID user data and ask Cassandra to index it: http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes

继续阅读：bigtable cassandra datamodel

Cassandra/BigTable data model - what's the best approach for building indexes?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？