Any advice on how to handle fail-over in an ejabberd cluster?
Context:
I have a system that will need to support 20,000 connected chat users spread over 100 chat rooms. During performance testing I've found that I can get up to 6,000 connected users on a single box before I get a crash dump, so in production I'll probably go with four servers in a cluster.My Question:
I understand that a chatroom is bound to a server node, so that if the node dies the chatroom disappears with it and the users no longer belong to the room. Is there a way to "replicate" a chatroom over to another node so that users who are left behind are moved to the replicated开发者_运维知识库 room? If not, what do you do to keep continuity for the users?What hardware are you using ? 6000 connected users seem a bit low. Also, ejabberd is not supposed to crash under load. It might slow down, but not crash.
There is something wrong in your setup.
About replicating a chatroom node, it's not easy. It's better to handle smooth reconnection on the client side.
But then again, ejabberd should not crash under this kind of load, unless something's wrong.
精彩评论