Leader Election Algorithm
I am exploring various architectures in cluster computing. Some of the popular ones are:
- Master-Slave.
- RPC
- ...
In Master-slave, the normal way is to set one machine as master & a bunch of machines as slaves controlled by master. One particular algo here got开发者_StackOverflow中文版 me interested. It's called Leader-Election Algo which has a certain randomness in selecting which of the machines will become master.
My question is - Why would anyone want to elect a master machine this way? What advantages does this approach have compared to manually selecting a machine as master?
There are some advantages with this algorithms:
- Selection of node as leader will be done dynamically so for example you can select node with highest performance, and arrival of new nodes may be makes better choice.
Another good approach by dynamically selecting leader is, if one of a nodes have major fault (for example PC is shutting down) you have other choices and there is no need to manually change the leader.
if you manually select node should manually configure all other nodes to use this node, and also set their time manually, ... but this algorithms will help you to handle timing issues.
- for example (not very relevant) why in most cases using DHCP? too many configs will be handeled by this algorithms.
Main idea of using such algorithms is to get rid of additional configuration, add some kind of flexibility, and stability of the whole system. But usually (in HPC/MPI applications) master node is selected manually.
Suppose your master selection algorithms is quite easy - get the list of available systems and select the one with the highest IP address. In this case you can easily start a new process on any of your nodes and it will automatically find the master node.
One nice example of such ideas is the WCCP protocol "designated proxy" selection algorithm where the number of proxies could be flexible and master node is selected in the runtime.
Considering a network of nodes, where it is vital to have one leader node at all times. If the current leader dies, then the network some how has to choose another leader. Given this scenario and requirement there are two possible ways to do it.
The central system approach, where there is a central node deciding who will be the leader. If the current leader dies, then this central node will decide on who should take over the leader role. But this is single point of failure, that is the central node who is responsible for deciding the leader, goes down then there is no one there to select leaders if the current leader dies.
Where as in the same scenario we can use distributed leader selection, as in all the nodes come to a consensus who the leader should be. So we do not need to have a central node who decides on who the leader should be, hence eliminating the single point of failure. When the leader node dies, then there will be a way to detect node failure, and then every node will start a distributed leader selection algorithm, and mutually come to a consensus of electing a leader.
So, in short when you have a system which has no central control, probably because the system is meant to be scalable without having single point of failure, in those systems to take choose some node, leader elections algorithms are used.
精彩评论