What is massively parallel processing (MPP)?

2023-01-02 05:30 问答作者：

Ever since Microsoft introduced sql-server version code-nam开发者_开发百科ed "Madison" the massively parallel processing (MPP) has got into picture. What exactly is it and how does sql-server is going to benefit from it ?

Further is massively parallel processing (MPP) related to parallel computing ?

This is basically the strategy that Teradata has used. You have dedicated server processing, memory and storage, and the data is partitioned across the processing units. Each unit has its own redundancy built in, since the data is not stored anywhere else - if you lose an AMP, you would lose the data.

In Teradata, the magic which enables the partitioning is the PRIMARY INDEX. This determines which AMP the data lives on. The query is distributed to all the AMPs and they return the data which is then combined. Performance suffers when there is skew and data needs to be redistributed from the AMP where it lives to the AMP which needs it for processing.

So the inter-process communication system, the query processor and the hash system are the key components to this kind of system.

In many cases, the massively parallel approach works well when data shares very similar primary indexes (millions of customers, millions of customer invoices, millions of customer click-stream events). This is great for a large class of problems, because things are often partitioned by customer, or by date or something similar.

It fails when you deal with things like Kimball-style star schemas or attempting to navigate a very complex 3NF model in a single query. In these cases, you are better off building intermediate temporary or volatile tables and specifying the primary index to get the data distributed well over the AMPs and matching whatever it is you are going to join on in the next join. Or remodeling your warehouse.

In MPP systems, adding capacity involves adding memory, storage and processing all at the same time, which gives fairly good scalability.

It is the ability to offload work to a different computer, not just to another core on the same computer So if you have 4 servers with 64 cores each you can tap into 256 cores

And IIRC you can't not install this yourself, you need to buy a pre configured system, what you basically get is a rack of computers

The wiki entry defines massively parallel computing as:

Massive parallel processing (MPP) is a term used in computer architecture to refer to a computer system with many independent arithmetic units or entire microprocessors, that run in parallel. The term massive connotes hundreds if not thousands of such units. Early examples of such a system are the Distributed Array Processor, the Goodyear MPP, the Connection Machine, and the Ultracomputer.

SQL Server will benefit in the same way it does already, by performing certain query steps in parallel. BUT only a relatively small class of algorithms can take advantage of massively parallel computing; speed-up does not increase linearly with the addition of more cores. A good example of where it can be used, is where tables are partitioned into separately searchable silos, for example partitioning on date range.

继续阅读：.net parallel-processing sql-server

What is massively parallel processing (MPP)?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？