Intensive file I/O and data processing in C#

2022-12-16 21:23 问答作者：

I'm writing an app which needs to process a large text file (comma-separated with several different types of records - I do not have the power or inclination to change the data storage format). It reads in records (often all the records in the file sequentially, but not always), then the data for each record is passed off for some processing.

Right now this part of the application is single threaded (read a record, process it, read the next record, etc.) I'm thinking it might be more efficient to read records in a queue in one thread, and process them in another thread in small blocks or as they become available.

I have no idea how to start programming something like that, including the data structure that would be necessary or how to implement the multithr开发者_如何学Ceading properly. Can anyone give any pointers, or offer other suggestions about how I might improve performance here?

You might get a benefit if you can balance the time processing records against the time reading records; in which case you could use a producer/consumer setup, for example synchronized queue and a worker (or a few) dequeueing and processing. I might also be tempted to investigate parallel extensions; it is pertty easy to write an IEnumerable<T> version of your reading code, after which Parallel.ForEach (or one of the other Parallel methods) should actually do everything you want; for example:

static IEnumerable<Person> ReadPeople(string path) {
    using(var reader = File.OpenText(path)) {
        string line;
        while((line = reader.ReadLine()) != null) {
            string[] parts = line.Split(',');
            yield return new Person(parts[0], int.Parse(parts[1]);
        }
    }
}

Take a look at this tutorial, it contains all you need... These are the microsoft tutorials including code samples for a similiar case as you describe. Your producer fills the queue, while the consumer pops records off.

Creating, starting, and interacting between threads

Synchronizing two threads: a producer and a consumer

You may also look at asynchronous I/O. In this style, you'll start a file operation from the main thread, it will then continue running in background and when it completes, it invokes a callback that you specified. In the meantime, you can continue doing other things (such as processing the data). For example, you could start an asynchronous operation to read the next 1000 bytes, then process the 1000 bytes you already have and then wait for the next kilobyte.

Unfortunately, programming asynchronous operations in C# is a bit painful. There is a MSDN sample, but it's not nice at all. This can be nicely solved in F# using asynchronous workflows. I wrote an article that explains the problem and shows how to do similar thing using C# iterators.

A more promissing solution for C# is Wintellect PowerThreading library which supports similar trick using C# iterators. There is a good introductory article in MSDN Concurrency Affairs by Jeffrey Richter.

继续阅读：data-processing multithreading

Intensive file I/O and data processing in C#

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？