C# Stream Design Question

2022-12-12 01:06 问答作者：

I have an appliction right now that is a pipeline design. In one the first stage it reads some data and files into a Stream. There are some intermediate stages that do stuff to the stream of data. And then there is a final stage that writes the stream out to somewhere. This all happens serially, one stage completes and then hands off to the next stage.

This all has been working just great, but now the amount of data is starting to get quite a bit larger (hundreds of GB potentially). So I'm thinking that I will need to do something to alleviate this. My initial thought is what I'm looking for some feedback on (being an independent developer I just don't have anywhere to bounce the idea off of).

I'm thinking of creating a Parallel pipeline. The Object that starts off the pipeline would create all of the stages and kick each one off in it's own thread. When the first stage gets the stream to some certain size then it will pa开发者_如何学Pythonss that stream off to the next stage for processing and start up a new stream of its own to continue to fill up. The idea here being that the final stage will be closing out streams as the first stage is building a new ones so my memory usage would be kept lower.

So questions: 1) Any high level thoughts on directions for this design? 2) Is there a simpler approach that you can think of that might apply here? 3) Is there anything existing out there that does something like this that I could reuse (not a product I have to buy)?

Thanks,

MikeD

The producer/consumer model is a good way to proceed. And Microsoft has their new Parallel Extensions which should provide most of the ground work for you. Look into the Task object. There's a preview release available for .NET 3.5 / VS2008.

Your first task should read blocks of data from your stream and then pass them onto other tasks. Then, have as many tasks in the middle as logically fit. Smaller tasks are (generally) better. The only thing you need to watch out for is to make sure the last task saves the data in the order it was read (because all the tasks in the middle may finish in a different order to what they started).

For the design you've suggested, you'd want to have a good read up on producer/consumer problems if you haven't already. You'll need a good understanding of how to use semaphores in that situation.

Another approach you could try is to create multiple identical pipelines, each in a separate thread. This would probably be easier to code because it has a lot less inter-thread communication. However, depending on your data you may not be able to split it into chunks this way.

In each stage, do you read the entire chunk of data, do the manipulation, then send the entire chuck to the next stage?

If that is the case, you are using a "push" technique where you push the entire chunk of data to the next stage. Are you able to handle things in a more stream like manor using a "pull" technique? Each stage is a stream, and as you read data from that stream, it pulls data from the previous stream by calling read on it. As each stream is being read, it reads from the previous stream in small bits, processes it and returns the processed data. The destination stream determines how many bytes to read from the previous stream, and you don't ever have to consume large amounts of memory. This is how applications like BizTalk work. There are some blogs about how BizTalk Pipeline streams work, and I think it might be exactly what you want.

Here's a multi-part blog entry that you might find interesting:

Part 1
Part 2
Part 3
Part 4
Part 5

继续阅读：.net pipeline

C# Stream Design Question

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？