开发者

C# Process Binary File, Multi-Thread Processing

I have the following code that processes a binary file. I want to split the processing workload by using threads and assigning each line of the binary file to threads in the ThreadPool. Processing time for each line is small but when dealing with files that might contain hundreds of lines, it makes sense to split the workload.

My question is regarding the BinaryReader and thread safety. First of all, is what I am doing below acceptable. I have a feeling it would be better to pass only the binary for each line to the PROCESS_Binary_Return_lineData method.

Please note the code below is conceptual. I am looking for a bit of guidance on this as my knowledge of multi-threading is in its infancy. Perhaps there is a better way to achieve the same result, i.e. split processing of each binary line.

        var dic = new Dictionary<DateTime, Data>();        
        var resetEvent = new ManualResetEvent(false);

        using (var b = new BinaryReader(File.Open(Constants.dataFile, 
                            FileMode.Open, File开发者_运维技巧Access.Read, FileShare.Read)))
        {
        var lByte = b.BaseStream.Length;
        var toProcess = 0;

        while (lByte >= DATALENGTH)
        {
            b.BaseStream.Position = lByte;
            lByte = lByte - AB_DATALENGTH;

            ThreadPool.QueueUserWorkItem(delegate
            {
                Interlocked.Increment(ref toProcess);
                var lineData = PROCESS_Binary_Return_lineData(b);

                lock(dic)
                {
                    if (!dic.ContainsKey(lineData.DateTime))
                    {
                     dic.Add(lineData.DateTime, lineData); 
                    }
                }

                if (Interlocked.Decrement(ref toProcess) == 0) resetEvent.Set();
            }, null);
        }
        }

        resetEvent.WaitOne();


This doesn't look thread-safe to me. If you have more than one work item queued, and two of them happen to run at the same time, the reader's position could easily change between assignment and reading.

If you insist on using threads for this, you'd do better to read the data in your main thread and queue up the resulting byte arrays for reading. Any solution involving each thread reading from the file will involve locking, and at that point, you're not gaining anything at all from using threads.


It very rarely makes sense to use threads to improve file processing performance. A thread, when run on a multi-core CPU, provides more CPU cycles. This is rarely the resource you are short of when processing files. You need more disks. Not an option of course.

Smoke test this first. Reboot your machine so the file won't be stored in the file system cache. Run your single-threaded program and observe the CPU load. Taskmgr.exe, Performance tab is good for that. If you don't see one CPU maxed-out at 100% load then adding another CPU cannot make your program quicker.


"I have a feeling it would be better to pass only the binary for each line to the PROCESS_Binary_Return_lineData method."

yes, you need to do that, since your delegate may not get round to reading from the BinaryReader, before it gets repositioned

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜