开发者

Clustering doesn't work

I config clustering for two tomcat using apache at front and mod_jk as connector. I tried a test application to check the configuration and it works fine. Session are being successfully replicated and failover is detected successfully. But when i tried this for my actual application, it does not work. I made the modification in httpd.conf accordingly and very carefully. There is no exception,no error in the logs. I am unable to track the problem. Initially i was getting NotSerializableException for a particular classes and i made them serializable. Now there is no exception but still i am una开发者_JAVA技巧ble to load the application if the hosting tomcat is shutted down even when the other tomcat member of the cluster are alive. Can you guys please help me. I can understand it is quite tough to produce the solution when you are not sure of the problem.


So you have 2 services, configured the same way, except that one fails over correctly and the other doesn't?

There is a general rule of thumb when you're seeing something that looks impossible. And that rule is that you're not seeing what you think you are seeing. Frequently because of what is jokingly referred to as PEBKAC (Problem Exists Between Keyboard And Chair). The really frustrating thing is that, no matter how obvious it is, you can stare at it 100 times and it won't be obvious because you see what you "know" is there rather than what is there.

In my experience there are two good ways to solve this kind of problem.

  1. Take it to someone else, and ask them to find what you're doing differently. Given that they see what is there, and not what you "know" is there, they will often see what you can't. (In full course of time you may be able to return the favor some day.)
  2. Start with the working configuration and the non-working one, and start "bisecting" the path between them until you get a minimal difference that tells the difference between working and non-working. Whittle that difference down, and you'll either know what to fix, or else have a test case to give someone else.

Odds are that you'll need to follow the second approach. You probably don't want to - I never do - but it usually is less painful than you imagine. You start by replicating the full application on a test system, and demonstrating that you have the same failure. (If you don't, then you start looking, carefully, for differences between production and test. In particular look at things like operating system version, library versions, and the like.)

Assuming that you have a test system, save that configuration. Then start ripping out large chunks of your actual application that you imagine have nothing to do with your configuration problems, testing periodically that you are on the right path. (And saving every time that you are.) Once you have a minimal application, start trying to walk it over towards the working test application. Somewhere you'll find a change that makes a difference. It could be anywhere. Once you have found it, you'll usually know exactly how to fix your production system. Or if you don't, you'll know your problem fairly clearly.

Sometimes you'll have found a weird bug. If so, then you should then start trying to simplify everything as much as possible until you have a nice bug report to send in.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜