开发者

How do Reactive Framework, PLINQ, TPL and Parallel Extensions relate to each other?

At least since the release of .NET 4.0, Microsoft seems to have pu开发者_开发问答t a lot of effort in support for parallel and asynchronous programming and it seems a lot of APIs and libraries around this have emerged. Especially the following fancy names are constantly mentioned everywhere lately:

  • Reactive Framework,
  • PLINQ (Parallel LINQ),
  • TPL (Task Parallel Library) and
  • Parallel Extensions.

Now they all seem to be Microsoft products and they all seem to target asynchronous or parallel programming scenarios for .NET. But it is not quite clear what each of them actually is and how they are related to each other. Some might actually be the same thing.

In a few words, can anyone set the record straight on what is what?


PLINQ (Parallel Linq) is simply a new way to write regular Linq queries so that they run in parallel - in other words, the Framework will automatically take care of running your query across multiple threads so that they finish faster (i.e. using multiple CPU cores).

For example, let's say that you have a bunch of strings and you want to get all the ones that start with the letter "A". You could write your query like this:

var words = new[] { "Apple", "Banana", "Coconut", "Anvil" };
var myWords = words.Select(s => s.StartsWith("A"));

And this works fine. If you had 50,000 words to search, though, you might want to take advantage of the fact that each test is independent, and split this across multiple cores:

var myWords = words.AsParallel().Select(s => s.StartsWith("A"));

That's all you have to do to turn a regular query into a parallel one that runs on multiple cores. Pretty neat.


The TPL (Task Parallel Library) is sort of the complement to PLINQ, and together they make up Parallel Extensions. Whereas PLINQ is largely based on a functional style of programming with no side-effects, side-effects are precisely what the TPL is for. If you want to actually do work in parallel as opposed to just searching/selecting things in parallel, you use the TPL.

The TPL is essentially the Parallel class which exposes overloads of For, Foreach, and Invoke. Invoke is a bit like queuing up tasks in the ThreadPool, but a bit simpler to use. IMO, the more interesting bits are the For and Foreach. So for example let's say you have a whole bunch of files you want to compress. You could write the regular sequential version:

string[] fileNames = (...);
foreach (string fileName in fileNames)
{
    byte[] data = File.ReadAllBytes(fileName);
    byte[] compressedData = Compress(data);
    string outputFileName = Path.ChangeExtension(fileName, ".zip");
    File.WriteAllBytes(outputFileName, compressedData);
}

Again, each iteration of this compression is completely independent of any other. We can speed this up by doing several of them at once:

Parallel.ForEach(fileNames, fileName =>
{
    byte[] data = File.ReadAllBytes(fileName);
    byte[] compressedData = Compress(data);
    string outputFileName = Path.ChangeExtension(fileName, ".zip");
    File.WriteAllBytes(outputFileName, compressedData);
});

And again, that's all it takes to parallelize this operation. Now when we run our CompressFiles method (or whatever we decide to call it), it will use multiple CPU cores and probably finish in half or 1/4th the time.

The advantage of this over just chucking it all in the ThreadPool is that this actually runs synchronously. If you used the ThreadPool instead (or just plain Thread instances), you'd have to come up with a way of finding out when all of the tasks are finished, and while this isn't terribly complicated, it's something that a lot of people tend to screw up or at least have trouble with. When you use the Parallel class, you don't really have to think about it; the multi-threading aspect is hidden from you, it's all handled behind the scenes.


Reactive Extensions (Rx) are really a different beast altogether. It's a different way of thinking about event handling. There's really a lot of material to cover on this, but to make a long story short, instead of wiring up event handlers to events, Rx lets you treat sequences of events as... well, sequences (IEnumerable<T>). You get to process events in an iterative fashion instead of having them fired asynchronously at random times, where you have to keep saving state all the time in order to detect a series of events happening in a particular order.

One of the coolest examples I've found of Rx is here. Skip down to the "Linq to IObservable" section where he implements a drag-and-drop handler, which is normally a pain in WPF, in just 4 lines of code. Rx gives you composition of events, something you don't really have with regular event handlers, and code snippets like these are also straightforward to refactor into behaviour classes that you can sleeve in anywhere.


And that's it. These are some of the cooler features that are available in .NET 4.0. There are several more, of course, but these were the ones you asked about!


I like Aaronaught's answer, but I would say Rx and TPL solve different problems. Part of what the TPL team added are the threading primitives and significant enhancements to the building blocks of the runtime like the ThreadPool. And everything you list is built on top of these primitives and runtime features.

But the TPL and Rx solve two different problems. TPL works best when the program or algorithm is 'pulling & queuing'. Rx excels when the program or algorithm needs to 'react' to data from a stream (like mouse input or when receiving a stream of related messages from an endpoint like WCF).

You'd need the 'unit of work' concept from TPL to do work like the filesystem, iterating over a collection, or walking a hierarchy like a org chart. In each of those cases the programmer can reason about the overall amount of work, the work can be broken down into chunks of a certain size (Tasks), and in the case of doing computations over a hierarchy the tasks can be 'chained' together. So certain types of work lend themselves to the TPL 'Task Hierarchy' model, and benefit from the enhancements to plumbing like cancellation (see Channel 9 video on CancellationTokenSource). TPL also has lots of knobs for specialized domains like near real-time data processing.

Rx will be what most developers should end up using. It is how WPF applications can 'react' to external messages like external data (stream of IM messages to an IM client) or external input (like the mouse drag example linked from Aaronaught). Under the covers Rx uses threading primitives from TPL/BCL, threadsafe collections from TPL/BCL, and runtime objects like the ThreadPool. In my mind Rx is the 'highest-level' of programming to express your intentions.

Whether the average developer can get their head wrapped around the set of intentions you can express with Rx is yet to be seen. :)

But I think the next couple of years the TPL vs. Rx is going to be the next debate like LINQ-to-SQL vs. Entity Framework. There are two flavors of API in the same domain and are specialized for different scenarios but overlap in a lot of ways. But in the case of TPL & Rx they are actually aware of each other and there are built-in adapters to compose applications and use both frameworks together (like feeding results from a PLINQ loop into an IObservable Rx stream). For the folks who haven't done any parallel programming there is a ton of learning to get up to speed.

Update: I've been using both TPL and RxNet in my regular work for the past 6 months (of the 18 months since my original answer). My thoughts of choice of TPL and/or RxNet in a middle-tier WCF service (enterprise LOB service): http://yzorgsoft.blogspot.com/2011/09/middle-tier-tpl-andor-rxnet.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜