Distributed Database Computing - Is it really possible within the RDBMS paradigm?

2023-01-22 19:01 问答作者：

I am asking this in the context of NoSQL - which achieves scalability and performance without being expensive.

So, if I needed to achieve massively parallel distributed computing across databases ... What are the various methodologies available today 开发者_如何学运维(within the RDBMS paradigm) to achieve distributed computing with high-scalability?

Does database clustering & mirroring contribute in any way towards distributed computing?

I guess you are asking about scalability of RDBMS databases. Talking about NoSQL databases based on ( amazon dynamo, BigTable ) are a whole another topic. I am talking about HBase, Cassandra etc. There are also commerical products like Oracle Coherence thats more like a distributed cache and key value store , to put it crudely.

going back to rdbms,

Sharding to scale RDBMS one can do cusstom sharding. Sharding is a technique where you have multiple table is possibly multiple hosts. And then you decide in a certain fashion to assign certain rows to certain tables. For example you can say that rows 1-1M goes to table1, 1M-2M goes to table2 etc. But, this is a difficult process from an administration point of view. A lot of large scale websites scale by relying on sharding. Other techniques worth mentioning are partioning and mysql federation and mysql cluster.

MPP databases Then there are databases are there very RDBMS which does distribution and scaling for you. Terradata is the most successful of these companies. I believe they used postgres core code at some point. A significant number of fortune 500 companies and a lot of the airlines use Terradata. But, its ridiculously expensive. There are newer companies like greenplum, vertica, netezza.

Unless you're a very big company with extreme scalability requirements, you can horizontally and ACID scale up your DB by building a cluster of identical RDBMS instances and synchronizing them with JTA transactions.

Take a look to this Java/JDBC based article the JEPLayer framework is used but you can use straight JDBC and JTA code.

Within the RDBMS paradigm: Sharding.
Outside the RDBMS paradigm: Key-value stores.

My pick: (I come from an RDBMS background) Key-value stores of the tabluar type - HBase.

Within the RDBMS paradigm, sharding will not get you far.
Use the RDBMS paradigm to design your model, to get your project up and running.
Use tabular key-value stores to SCALE OUT.

Sharding:

A good way to think about sharding is to see it as user-account-oriented
DB design.

The all schema entities touched by a user-account are kept on one host.

The assignment of user to host happens when the user creates an account.
The least loaded host gets that user.

When that user signs on after account creation, he gets connected
to the host that has his data.

Each host has a set of user accounts.

The problem with this approach is that if the host gets hosed,
a fraction of users will be blacked out.

The solution to this is have a replicated standby host that
becomes the primary when the primary host encounters problems.

Also, it's a fairly rigid setup for processes where the design does
not change dramatically.

From the user standpoint, I've noticed that web sites
with a sharded DB backend are not as quick to "turn on a dime"
to create different business models on their platform.

Contrast this with web sites that have truly distributed
key-value stores. These businesses can host any range of
services. Their platform is just that - a platform.
It's not relational and it does have an API interface,
but it just seems to work.

继续阅读：distributed-database nosql scalability sharding sql

Distributed Database Computing - Is it really possible within the RDBMS paradigm?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？