How to achieve maximum concurrency for a distributed application using a database as medium of communication
I have an application which is similar to classic producer consumer problem. Just wanted to check out all the possible implementations to achieve it. The problem is-
Process A: inserts a row into the table in database (producers)
Process B: reads M rows from the table, deletes the read M rows after processing.
Tasks in process B: 1. Read M rows 2. Process these rows 3. Delete these rows
N1 instances of process A, N2 instances of process B runs concurrently开发者_如何转开发.
Each instance runs on a different box.
Some requirements: If a process p1 is reading (0,M-1) rows. process p2 should not wait for p1 until it releases the lock on these rows, instead it should read (M,2M-1) rows.
I bet there are better ways of parallel processing than using DB as the excahnger between producer and consumer. Why not queues? Have you checked the tools/frameworks designed for Map/Reduce. Hadoop, GridGain, JPPF all can do this.
Similar concept is being used in ConcurrentHashMap of Java.15. A list of rows which are being processed should be maintained separately. When any process needs to interact with DB, it should check whether that rows are being processed by another process. If so it should wait on that condition, else it can process. maintaining Indexes might help in such a case
I think that if this application is implemented it actually uses hand made queue. I believe that JMS is much better in this case. There are a lot of JMS implementations available. Most of them are open source.
In your case process A should insert tasks into the queue. Process B should be blocked on receive()
, get N messages and then process them. You probably have reasons to get a bulk of tasks from your queue but if you change implementation to JMS based you probably do not need this at all, so you can just listen to the queue and process message immediately. The implementation becomes almost trivial, very flexible and scalable. You can run as many processes A and B as you want and distribute them among separate boxes.
You may also want to take a look into Amazon Elastic Map Reduce
http://aws.amazon.com/elasticmapreduce/
精彩评论