开发者

What type of queue to use in parallel data processing - C# - .NET 4

Scenario: Data is received and written to database with timestamps. I need to process the raw data in the order that is received based on the time stamp and write it back to the database, different table, again maintaining the order based on the timestamp.

I came up with the following design: Created two queues, one for storing raw data from database, another for storing processed data before it's written back to DB. I have two threads, one reading to the Initial queue and another reading from Result queue. In between i spawn multiple threads to process data from Initial queue and write it to Result queue.

I have experimented with SortedList (manual locking) and BlockingCollection. I have used two approaches to process in parallel: Parallel.For(ForEach) and TaskFactory.Task.StartNew.

Each unit of data may take variable amount of time to process, based on several factors. One thread can still be processing the first data point while other threads are done with three or four datapoints each, messing up the timestamp order.

I have found out about OrderingPartitioner recently and i thought it would solve the problem, but following MSDNs example i can see, that it's not sorting the underlying collection either. May be i need to implement custom partitioner to order my collection of complex data types? or may be there's a better way of approaching the problem?

Any suggestions and/or links to articles discussing similar problem is highl开发者_如何学Cy appreciated.


Personally, I would at least try to start with using a BlockingCollection<T> for the input and a ConcurrentQueue<T> instance for the results.

I would use Parallel Linq to process the results. In order to preserve the order during your processing, you could use AsOrdered() on the PLINQ statement.


Have you considered PLINQ and AsOrdered()? It might be helpful for what you're trying to achieve. http://msdn.microsoft.com/en-us/library/dd460719.aspx


Maybe you've considered these things, but...

Why not just pass the timestamp to the database and then either let the database do the ordering or fix the ordering in the database after all processing threads have returned? Do the sql statements have to be executed sequentially?

PLINQ is great but I would try to avoid thread synchronization requirements and simply pass more ordering data to the database if you can.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜