[Tarantool-patches] [PATCH 1/2] replication: introduce ballot.can_be_leader

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Tue Jul 20 01:06:15 MSK 2021


On 19.07.2021 11:12, Konstantin Osipov wrote:
> * Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [21/07/18 20:03]:
>>>> The new field during bootstrap will help to avoid selecting a
>>>> 'voter' as a master. Voters can't write, they are unable to boot
>>>> themselves nor register others.
>>>>
>>>> @TarantoolBot document
>>>> Title: New field - IPROTO_BALLOT_CAN_BE_LEADER
>>>> It is sent as a part of `IPROTO_BALLOT (0x29)`. The field is a
>>>> boolean flag which is true if the sender has `election_mode` in
>>>> its config `'manual'` or `'candidate'`.
>>>>
>>>> During bootstrap the nodes able to become a leader are preferred
>>>> over the nodes configured as `'voter'`.
>>>
>>> Curious why did you add this feature in the first place, I mean
>>> "eligibility"? Each voter has to be able to become a leader,
>>> otherwise raft liveness guarantees are violated. Raft has
>>> learners, but learners neither vote nor can become leaders.
>>
>> Voters are nodes which an admin does not want to be a leader. For
>> instance, they are too far away physically. As voters, they might
>> help to elect a leader, for example, if there are just 3 nodes one
>> of which is a voter.
>>
>> Another application is when you specifically start 1 node as a
>> voter and 2 candidates. The voter might skip all the replication
>> data and work on a slow small machine.
>>
>> It can help to form a majority. We are planning to make this
>> feature even easier to use by adding dataless nodes just for
>> voting.
>>
>> As for Raft, it should not bring any problems. In Raft you can
>> say that all nodes are candidates, but some of them are so slow,
>> that they can never vote for themselves in time. Raft still works,
>> and you essentially have 'voters'.
> 
> Imagine there are nodes A, B, C, D, E.
> A is a leader, E is a voter which can not become a leader.
> 
> Imagine A's log index is 5, B = 4, C = 3, D = 2, E = 5.
> 
> The majority's log index is 4, so entry 4 is committed. A dies, B
> is partitioned away. The cluster is stuck, because neither C nor B
> can get a quorum (3 votes).

But how is it different from the real Raft? In normal Raft I can say
E simply is too slow to make any actions. It is just stuck or died.
The cluster will be stuck then, yes. Not much you can do here.

You can think of a voter as of almost a permanently broken node which
sometimes manages to vote but never manages to become a candidate in
time. I suppose Raft can withstand that behaviour.

> Worse yet, if E's (voter) commit index is low, not high, it can vote for a
> node which doesn't have a committed entry. In that case you can
> lose a committed entry.

Could you provide an example? Because I still do not see how is it
different from the classic Raft in which one node either is always too
late to become a candidate or is turned off when there are no better
candidates.


More information about the Tarantool-patches mailing list