开发者

Is it possible to use thread-concurrency and parallelism together?

For one of my projects thats kind of a content-aggregator i'd like to introduce concurrency and if possible parallelism. At first hand this may seem pointless because concurrency and parallelism take different approaches. (Concurrency via threads introduces immediate concurrency, where as parallelism provides a potential).

So to better explain my problem, let me summarize my problem set.

As my project is a content-aggregator (that aggregates feeds,podcasts and similar stuff) it basically reads the data from web, parses them to return the meaningful data.

So as of right now i took a very simplistic sequential approach. Let's say that we've some amount of feeds we have to parse.

foreach(feed in feeds)
{
   read_from_web(feed)
   parse(feed)
}

So with sequential approach time taken parse all feeds and process them greatly depends on not o开发者_运维问答nly the parser code but time needed to get the xml source from web. We all know that it may take variable time to get read the source from web (because of the network conditions and similar issues).

So to speed up the code i can take an approach of worker threads which will introduce an immediate concurrency;

Is it possible to use thread-concurrency and parallelism together?

So a defined number of worker threads can take a feed & parse concurrently (which will for sure speed up the whole the process - as we'll see lesser impact of waiting for data over the net).

This is all okay until the point that, my target audience of the project mostly runs multi-core cpus -- because of the fact that they're gamers --.

I want to also utilize these cores while processing the content so started reading on the potential parallelism http://oreilly.com/catalog/0790145310262. I've still not finished reading it yet and don't know if this is already discusses but i'm quite a bit obsessed with this and wanted to ask over stackoverflow to get an overall idea.

So as the book describes potential parallelism: Potential Parallelism means that your program is written so that it runs faster when parallel hardware is available and roughly the same as an equivalent sequential program when it's not.

So the real question is, while i'm using worker threads for concurrency, can i still use possible parallelism? (running my feed parsers on worker threads and still distributing them to cpu cores -- if the cpu supports multi-cores of course)


I think it's more useful to think about IO-bound work and CPU-bound work; threads can help with both.

For IO-bound work you are presumably waiting for external resources (in your case, feeds to be read). If you must wait on multiple external resources then it only makes sense to wait on them in parallel rather than wait on them one after the other. This is best done by spinning up threads which block on the IO.

For CPU-bound work you want to use all of your cores to maximize the throughput of completing that work. To do that, you should create a pool of worker threads roughly the same size as your number of cores and break up and distribute the work across them. [How you break up and distribute the work is itself an interesting problem.]

In practice, I find that most applications have both of these problems and it makes sense to use threads to solve both kinds of problems.


Okay it seems i was greatly mistaken by the books description on possible parallelism. Thanks to answers i was able to figure out things;

From msdn: http://msdn.microsoft.com/en-us/library/dd460717(VS.100).aspx

The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces in the .NET Framework version 4. The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.

So basically it means TPL can handle all the details of concurrency via threading and also supports parallelism on multi-cores.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜