开发者

What is massively parallel processing (MPP)?

Ever since Microsoft introduced sql-server version code-nam开发者_开发百科ed "Madison" the massively parallel processing (MPP) has got into picture. What exactly is it and how does sql-server is going to benefit from it ?

Further is massively parallel processing (MPP) related to parallel computing ?


This is basically the strategy that Teradata has used. You have dedicated server processing, memory and storage, and the data is partitioned across the processing units. Each unit has its own redundancy built in, since the data is not stored anywhere else - if you lose an AMP, you would lose the data.

In Teradata, the magic which enables the partitioning is the PRIMARY INDEX. This determines which AMP the data lives on. The query is distributed to all the AMPs and they return the data which is then combined. Performance suffers when there is skew and data needs to be redistributed from the AMP where it lives to the AMP which needs it for processing.

So the inter-process communication system, the query processor and the hash system are the key components to this kind of system.

In many cases, the massively parallel approach works well when data shares very similar primary indexes (millions of customers, millions of customer invoices, millions of customer click-stream events). This is great for a large class of problems, because things are often partitioned by customer, or by date or something similar.

It fails when you deal with things like Kimball-style star schemas or attempting to navigate a very complex 3NF model in a single query. In these cases, you are better off building intermediate temporary or volatile tables and specifying the primary index to get the data distributed well over the AMPs and matching whatever it is you are going to join on in the next join. Or remodeling your warehouse.

In MPP systems, adding capacity involves adding memory, storage and processing all at the same time, which gives fairly good scalability.


It is the ability to offload work to a different computer, not just to another core on the same computer So if you have 4 servers with 64 cores each you can tap into 256 cores

And IIRC you can't not install this yourself, you need to buy a pre configured system, what you basically get is a rack of computers


The wiki entry defines massively parallel computing as:

Massive parallel processing (MPP) is a term used in computer architecture to refer to a computer system with many independent arithmetic units or entire microprocessors, that run in parallel. The term massive connotes hundreds if not thousands of such units. Early examples of such a system are the Distributed Array Processor, the Goodyear MPP, the Connection Machine, and the Ultracomputer.

SQL Server will benefit in the same way it does already, by performing certain query steps in parallel. BUT only a relatively small class of algorithms can take advantage of massively parallel computing; speed-up does not increase linearly with the addition of more cores. A good example of where it can be used, is where tables are partitioned into separately searchable silos, for example partitioning on date range.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜