Distributed Computing In C#

2023-03-26 22:24 问答作者：

I have a specific DLL that contains some language processing classes and methods. One of these methods gets a word as an argument and does some calculation about 3 sec and save the related result on a SQL-Server Db.

I want run this DLL Method on 900k words and this job may repeat every week. How can I easily distribute this work on multiple systems to save the time using 开发者_如何学运维c#?

Answer in the form: Requirement -- Tool

Scheduled Runs -- Quartz.NET

Quartz allows you to run "jobs" on any given schedule. It also maintains state between runs so if for some reason the server goes down, when it comes back up it knows to begin running the job. Pretty cool stuff.

Distributed Queue -- NServiceBus

A good ServiceBus is worth it's weight in gold. Basically what you want to do is ensure that all your workers are only doing a given operation for however many operations are queued. If you ensure your operations are idempotent NServiceBus is a great way to accomplish this.

Queue -> Worker1 += Worker 2 += Worker 3 --> Local Data Storage -> Data Queue + Workers -> Remote Data Storage

Data Cache -- RavenDb or SQLite

Basically in order to ensure that the return values of the given operations are sufficiently isolated from the SQL Server you want to make sure and cache the value somewhere in a local storage system. This could be something fast and non-relational like RavenDB or something structured like SQLite. You'd then throw some identifier into another queue via NServiceBus and sync it to the SQL Server, queues are your friend! :-)

Async Operations -- Task Parallel Library and TPL DataFlow

You essentially want to ensure that none of your operations are blocking and sufficiently atomic. If you don't know about TPL already you should, it's some really powerful stuff! I hear this a lot from Java folks, but it's worth mentioning...C# is becoming a really great language for async and parallel workflows!

Also one cool thing coming out of the new Async CTP is TPL DataFlow. I haven't used it, but it seems to be right up your alley!

Since it's existing code I would look for a way to split that list of 900k words.

Everything else would require much more changes.

I think this is addressed with Dryadlinq. Only know of it, no handson experience myself but it sounds like it fits the bill.

You could create an application that acted like server software. If would manage the list of words and distribute them to the clients. Your client software would be installed on the distrubuted pc's. You could then use MSMQ for a quick way to communicate back and forth.

You have the right idea. Divide and conquer. This is a typical job for distributed parallel computing. Let's say you have five machines, each with four cores, hyper-threaded. This gives you 40 logical processors.

As you have described, you have 750 hours of processing to do plus a little overhead. If you can split up the work onto 40 processing threads, you can get it all done in less than 20 hours. Splitting up the work is the easy part.

The hard part is distributing the work and executing it in parallel. You have some choices here as others have pointed out. Let me a few more for your consideration.

You could manually split the word list by query or some other device and launch separate and unique console applications on each node/workstation that would use the TPL to max out each logical processor of each machine.
You could use something MPAPI and code up your own nodes and workers.
You could install Windows Server on your node/workstations and run Microsoft HPC and using something like MPI.NET to kick off the jobs.
You could write a console application and use DuoVia.MpiVisor to distribute and execute on your workstations. (Full disclosure: I am the author of MpiVisor)

Good luck to you.

继续阅读：.net distributed-computing

Distributed Computing In C#

Scheduled Runs -- Quartz.NET

Distributed Queue -- NServiceBus

Data Cache -- RavenDb or SQLite

Async Operations -- Task Parallel Library and TPL DataFlow

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Scheduled Runs -- Quartz.NET

Distributed Queue -- NServiceBus

Data Cache -- RavenDb or SQLite

Async Operations -- Task Parallel Library and TPL DataFlow

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？