ZooKeeper源码阅读(五):Leader选举
ZooKeeper中的Leader选举也不是Paxos, 实现相关的类包括FastLeaderElection, LeaderElection, 继承层次如下:
//Election
// +AuthFastLeaderElection
// +FastLeaderElection
// +MockFLE (测试)
// +LeaderElection
// +MockLeaderElection
默认使用的是FastLeaderElection, 不过ZooKeeper其实并不关心Leader选举是如何实现的,只要满足:
The leader has seen the highest zxid of all the followers.
A quorum of servers have committed to following the leader.
如果有follower的zxid比Leader看到的zxid还大,说明是Leader选举结束之后才连上Leader的.这时Leader会发送TRUNC消息,让Follower丢弃.
1) FastLeaderElection的实现
//构造函数
//两个队列,在Messenger中将会用到
sendqueue = newLinkedBlockingQueue<ToSend>();
recvqueue = newLinkedBlockingQueue<Notification>();
//messenger对象
//Messenger启动了两个线程:WorkerSender,WorkerReceiver
this.messenger =newMessenger(manager);
//WorkerSender消费sendqueue
ToSendm = sendqueue.poll(3000,TimeUnit.MILLISECONDS);
process(m);
//队列中消息类型为ToSend
//{leader, zxid,elecEpoch, state, sid, peerEpoch, configData}
//表示的消息为Notification或者对Notification的回复
//WorkerReceiver接收消息
//如果不是参与投票的,直接回复自己的投票,放到sendqueue中
//
//否则,解析成Notification。自己还是Looking则放到recvqueue中,
//自己不是Looking,对方是Looking,则回复自己的leader编号
//lookForLeader开始新的一轮选举
//更新logicalclock
synchronized(this){
logicalclock++;
updateProposal(getInitId(), getInitLastLoggedZxid(), getPeerEpoch());
}
sendNotifications();
//发送Notification,直到不是LOOKING
while ((self.getPeerState()== ServerState.LOOKING) &&
(!stop)){
/*
*Remove next notification from queue, times out after 2 times
* thetermination time
*/
Notification n=recvqueue.poll(notTimeout, TimeUnit.MILLISECONDS);
//如果是投票者(VoterFollower),根据它的状态
//LOOKING:
//更新logicclock,更新Proposal,发送Notification
//放入recvset,然后判断是否达到多数派
//如果达到,则等待finalizeWait微秒,没有更优的投票则选举结束
//对方是Following,Leading
//如果logicalclock相符,检查是否达到多数派
//放到outofelection的都是LEADING或者FOLLOWING状态的,已经选举完毕
//都要checkLeader(outofelection,n.leader, n.electionEpoch)检查
//checkLeader
/*
* Ifeveryone else thinks I'm the leader, I must be the leader.
* Theother two checks are just for the case in which I'm not the
* leader.If I'm not the leader and I haven't received a message
* fromleader stating that it is leading, then predicate is false.
*/
if(leader !=self.getId()){
if(votes.get(leader)==null) predicate =false;
else if(votes.get(leader).getState()!= ServerState.LEADING) predicate =false;
} else if(logicalclock !=electionEpoch) {
predicate = false;
}
logicalclock的增加表示新一轮的选举过程.
Leader选举初始投票选自己,收到别人的投票之后判断是否更优,如果是则更新自己的投票,最终zxid最大的follower将收到多数派的投票,等待finalizeWait微秒后仍无更优投票,则转为LEADING状态.
2) LeaderElection的实现则更简单
//向所有VotingView中的Server建立socket,发送/接收vote
//然后countVotes判断截止
3) AuthFastLeaderElection则是在FastLeaderElection基础上增加了简单的认证
Leader选举时,将选择zxid最高的。这也是一个优化,因为不用和其它组员那里找丢失的事务。
参考:
http://zookeeper.apache.org/doc/r3.2.2/zookeeperInternals.html#sc_leaderElection