Processing large amounts of data using multithreading
I need to write a c# service ( could be a windows service or a console app) that needs to process large amounts of data ( 100 000 records) stored in a database. Processing each record is also a fairly complex operation. I need to perform a lot of inserts an updates as part of the processing.
We are using NHibernate as the ORM.
One way is to load all the records and process them sequentially... which could turn out to be quite slow. I was looking at multi threading options and was thinking of having multiples threads processing chunks of records simultaneously .
Could anyone give me some pointers on how I should approach this.. considering that I'm using NHibernate and what are the possible gotchas like deadlock etc
Thank开发者_JAVA技巧s a lot.
You should consider Task Parallel Library.
Assuming you are using .NET 4.0, you can use the Task Parallel Library (as has been mentioned) to do something like this:
Parallel.ForEach(sourceCollection, item => Process(item));
Your source collection would be an IEnumerable
of the loaded records. The library will handle everything for you:
The source collection is partitioned and the work is scheduled on multiple threads based on the system environment. The more processors on the system, the faster the parallel method runs.
It may help to read a tutorial on using Parallel.ForEach()
. Also, be aware of potential pitfalls.
Sounds like PLINQ is the best solution (Chapter 5 in this article). But as each calculation is working a lot with database, you should create separate session for each thread.
Use IStatelessSessions if possible and experiment with the adonet.batch_size property.
Also how performant does it need to be? I'm a fan of NH but this is one scenario where stored procedures might be better
精彩评论