开发者

How to architect a .NET app to do same task several times simultaneously & independently?

I have a need to develop a .NET app which is very similar to a web spider/crawler. Get data from a website, process data, save data in a database and send an email.

I want to process as many sites at once as the machine can (within reason). Each process is independent of each other. I will be using some third party server components, like from Chilkat Software. Only a single computer is used. Starting with Windows 7 64bit then going to Windows Server.

What archi开发者_运维技巧tecture or design should I use which handles the requirements I mentioned? Running several instances of the app (easiest way)? Using Windows WorkFlow Foundation (Never used it)? Some kind of parallel processing? ..? A pointer to a sample app which follows the proposed design is a plus.


You can use a pipeline architecture: crawl -> process -> save to db -> email; threading-safe queues should be used to connect different phases; each phase can be individually set to use N threads. Then in production environment, measure and tune the number of threads each phase can use such that no phase is waiting for other phases to provide/consume the data for most of the time.

Be aware that there are many other factors to adjust for the best result. Example: suppose your database can handle at most one save per second, but the pipe before database can easily produce ten pages per second, in this case, you many want to limit the queue size between database and process to a somewhat small number.

Tuning all these factors and watching how they interact with each other are interest and fun. You will be surprised to see how the machine can perform compared to a simply-go-multi-threading/processing approach.


I'd recommend using the System.Threading.Tasks library for something like this.

You could then do something like this in your app:

foreach(var input in listToProcess)
{
  Task.Factory.StartNew(() => ProcessInput(input));
}

private static void ProcessInput(Foo myInput)  // for example, this might be a url in your case
{
  // your specific processing here: get data from site, process, save, email
}


Workflow can definitely be used to do this sort of thing as well. It has some significant advantages with tracking that provides you a detailed log of everything that occured and it makes handling of multiple async tasks easy.

Given that you have never used it the downside for you will be the ramp up. We do provide hands on labs to get you going quickly.

See the hands on labs on our Beginners Guide To Workflow page

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜