开发者

System Design / Architecture Best Approach

I have a system, that has 3 general parts to aid my description.

1) DATABASE- to store all tables, this same database will store data for other services as well including a web application, silverlight etc... (Needs to be flexible, if on a remote server, can be exposed via a web service, if locally, can connect locally or via TCP to the windows service)

2) BLACK BOX - process ONE item at a time by injecting in the list of required items from the database, like a pipe, wh开发者_如何学Goere u put in a set of conditions, values for a single item, and returns the results for that single processed item.

3) WINDOWS SERVICE- to retrieve data from the database, injects into the black box, saves results from black box to the database at predefined intervals. The service might be sitting on a different server to the database. Will log error and continue should an error occur.

On average, the windows service will have to process about 5000 items, and it will take the black box about 1.5 second to process 5000 items.

My questions are:

a) Should the windows service get a batch list of items to process from the database, or should it get a list of ids, and in a loop get each individual items from the database before passing on to the black box? Note that the same database is being used by other applications as well. Tentatively, I am guessing the database should be a web service call of some sort.

b) Should an individual item be saved immediately after processing? Or should it wait for the batch to finish processing before saving? As saving each individual item after processing is good when the systems suddenly fails in the middle of the process, at least the processed ones are saved, but at the cost of performance due to its 5000 calls to the web service?

Any advice on an optimum solution?

Cheers


  1. you should pull your items in a batch so you don't clog the network with requests. grabbing a list of IDs, then looping them and pulling the full item each time is N extra database calls.

    • you can use a webservice to handle calling the database, if you think you will benefit from the abstraction. otherwise you'll just create unnecessary complexity.

  2. update the databse as you finish each item. the finished items can be used further down the line as soon as they are ready, instead of having to wait for batches of 5000 to finish.

    • this assumes you will be saving data for each item

    • you need to make N calls (to save each item) no matter what, so you don't gain much by waiting and then updating at the end of each batch.

    • if it crashes, you'll lose all the unsaved data.

    • if you don't need to store per-item results from the black box then you'd have a good reason to consider updating everything as a batch.


I've written a bunch of apps like this for a bank. My usual approach is as follows-- It's simple, fault-tolerant, and efficient. (assuming you need to process sets of items and save data for each one)

  1. the database has a table representing the status of processing an item, in addition to the items table. for a little extra work upfront, this will make debugging and auditing the process a breeze:

    table ItemsProcessStatus  -- feel free to improve upon the name
    int orderID (auto increment)
    int itemID  (fk to items)
    datetime pulledForProcessing null
    datetime finishedProcessing null
    ..etc
    
  2. windows service runs on a timer, say once every X minutes and pulls limit(Y) items to process. this marks the pulledForProcessing flag with a timestamp in the ItemsProcessStatus table.

    • You want to pull items where the pulled date is null [and also those that have been pulled, but not completed, and are older than Z minutes (I usually pick 15 to 30 minutes)]

    • Be careful with the procedure that pulls these. you need to use locks

    • You can refine this further: On the first iteration, grab Y items, where Y is a decent guess at how much you can process in that time span. The next iteration, you calculate the rate that it is processesing (as a sliding average) and adjust the the number of items to pull. this way it will continuously adjust itself to process at full capacity.

  3. the windows service processes these one by one (well, usually it's multithreaded, so many at once) by sending them to the black box.

    • I put them in a threadsafe Queue<> (not to be confused with msmq). Worker threads loop, pulling from the queue, processing the item in the black box, and then updating the database.

    • you can use any of the typical multithreading techniques here (wait/pulse, reader/writer lock slim, wait handles), or just have the worker thread sleep for a few seconds if the queue is empty

  4. after each item finishes, call the updates proc for that item, which also updates the ItemsProcessStatus table (signifying that it has finished processing)

  5. When your service is stopped, finish processing any items that are being processed and update them in the db.

    • For all the items that haven't been sent to the black box, you unmark them in the process table by setting pulledForProcessing to null.

  6. if your service crashes, you don't 'lose' a lot of data. items that didn't get unmarked will get pulled again when they are over a certain age (process table)


This works with multiple instances of the windows service installed on an array of servers (though you'll want to add ComputerName to the process table to identify which computer each service is running on). this works because each service just grabs the 'next set of items' to process--there's no need for any kind of routing or for the processes to communicate with each other.


MSMQ is Microsoft's queuing approach. I agree a queuing approach should be used - this is done in most systems handling large number of transactions. For example, at the bank I used to workfor we used MQ as our middleware solution.

The advantage is that the next step in the proces can begin immediately processing after the first, without waiting for all 5000 entries to be processed. What if the number increases to 500,000,000? Then the wait time for the first item to complete will go up enormously. Using a queuing approach, it wouldn't change at all.

There are other advantages - scaleability, robustness, things like guaranteed delivery - but you can learn about those issues later.

Also, a well-implemented queue produces very little wait overhead in the processes that use it, since they almost always support multiple threads accessing the queues. (There will be overhead, but there won't a greatly increased wait time).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜