Creating Thousands of Threads Quickly and Executing Them Near Simultaneously

2023-01-20 10:38 问答作者：

I have a C#.NET application that needs to inform anywhere from 4000 to 40,000 connected devices to perform a task all at once (or as close to simultaneous as possible).

The application works well; however, I am not satisfied with the performance. In a perfect world, as soon as I send the command I would like to see all of the devices respond simultaneously. Yet, there seems to be a delay as al开发者_Go百科l the threads I have created spin up and perform the task.

I have used the .NET 4.0 ThreadPool, created my own solution using custom threads and I have even tweaked the existing ThreadPool to allow for more threads to be executed at once.

I still want better performance and that is why I am here. Any ideas? Comments? Suggestion? Thank you.

-Shaun

Let me add that the application notifies these 'connected devices' that they need to go listen for audio on a multicast address.

A dual-core hyperthreaded processor MAY be able to execute 4 threads simultaneously - depending on what the thread is doing (no contention on IO or memory access, etc). A quad-core hyperthread perhaps 8. But 40K just can't physically happen.

If you want near simultaneous, you're better off spinning up just as many threads as the computer has free cores and having each thread fire off notifications then end. You'll get rid of a bunch of context switching this way.

Or, look elsewhere. As SB recommended in the comments, use a UDP multicast to notify listening machines that they should do something.

You cannot execute 4000 threads simultaneously, let alone 40k. At best on a desktop box with hyperthreading, you might get up to 8 simultaneous processes going (this assumes quad core). Threads are pseudo-parallel, and that's not even digging into the issues of bus contention.

If you absolutely need simultaneity for 40k devices, you want some form of hardware synchronization.

It sounds like you have some control over what software runs on each device. In which case, you could look to HPC usage and architect your devices (nodes) hierarchically and/or use MPI to execute your remote processes.

For the hierarchy example: Designate say, 8 nodes as primary masters, again with 8 slave nodes, each slave can act as a master too with 8 slaves (you might need to look at an automated subscription algorithm to do this). You will have a hierarchy 6 deep to cover 40,000 nodes. Each master has a small portion of code running continually waiting for instructions to pass to slaves.

All you then do is pass the instruction to the 8 primary masters and your instruction will be propagated to the ‘cluster’ on the wire asynchronously by the masters. The instruction only has to be passed on a maximum of 5 times, and thus will be propagated v-quickly.

Alternatively (or in conjunction) you could look at MPI, which is an out-of-the-can solution. There are some established C# implementations.

The overhead of creating thousands of threads is (very) significant; I would seek an alternative solution. This sounds like a job for asynchronous IO: your computer presumably only has one network connection, so no more than one message can be sent at a time - threads cannot improve on this!

Am I correct in guessing that you're using a synchronous API call on your device, which is why it must be executed in a thread? Does the API have an asynchronous version of the call? If the device API can really support 40k+ devices, then it should. It should also have internal handling of whatever wait handles (or equivalent) are required to synchronize the return data for callback. This isn't something you can handle at the client application side; you don't have enough visibility of the underlying implementation of the device API to know how to parallelize the tasks. As you've discovered, creating 40k threads with blocking calls doesn't cut it.

You should do async IO to the devices. This is very efficient and uses a different (larger ) set of threads to handle some of the work. Certainly the devices will receive the commands much faster. The IO thread pool will handle the replies (if any)

Always fun with these old ones.

1mb per thread means you need 4-40gb just in RAM minimum, and 4k-40k cores. and the fact that you have a network to send it on.

Means that it will be syncronized somewhere along the way, on the nearest switch/router (most of it probably even on you network card, if you even could get all the packages there at the same time, and it managed to send it without caching it or dying on you). Meaning simply all that work multi threading was for nothing as it will not reach the endpoints simultaneously.

Think of it as taking one 40'000 lane road and placing 40'000 cars on it, sure everyone get to the same point on the road at the same time, but then they leave the road and go home. Everyone gets home at different times, even if they started driving on the 40k road at the same point and time.

You just, can not, beat the physical realm (yet...).

继续阅读：.net multithreading threadpool

Creating Thousands of Threads Quickly and Executing Them Near Simultaneously

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？