开发者

What ways exist to distribute asynchronous batch tasks?

I am currently investigating what Java compatible solutions exist to address my requirements as follows:

  • Timer based / Schedulable tasks to batch process
  • Distributed, and by that providing the ability to scale horizontally
  • Resilience, no SPFs please

The nature of these tasks (heavy XML generation, and the delivery to web based receiving nodes) means running them on a single server using something like Quartz isn't viable.

I have heard of technologies like Hadoop and JavaSpaces which have addressed the scaling and resilience end of the problem effectively. Not knowing whether these are quite suited to my requirements, its hard to know what other technologies might fit well.

I was wondering really what people in this space felt were options available, and how each plays its strengths, or suits certain problems better than others.

NB: Its worth noting that schedule-ability is perhaps a hangover from how we do things presently. Yes there are tasks which ought to go at cert开发者_运维百科ain times. It has also been used to throttle throughput at times when no mandate for set times exists.


Asynchronous always brings JMS to mind for me. Send the request message to a queue; a MessageListener is plucked out of the pool to handle it.

This can scale, because the queue and listener can be on a remote server. The size of the listener thread pool can be configured. You can have different listeners for different tasks.

UPDATE: You can avoid having a single point of failure by clustering and load balancing.

You can get JMS without cost using ActiveMQ (open source), JBOSS (open source version available), or any Java EE app server, so budget isn't a consideration.

And no lock-in, because you're using JMS, besides the fact that you're using Java.

I'd recommend doing it with Spring message driven POJOs. The community edition is open source, of course.

If that doesn't do it for you, have a look at Spring Batch and Spring Integration. Both of those might be useful, and the community editions are open source.


Have you looked into GridGain? I am pretty sure it won't solve the scheduling problem, but you can scale it and it happens like "magic", the code to be executed is sent to a node and it is executed in there. It works fine when you don't have a database connection to be sent (or anything that is not serializable).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜