[Tarantool-patches] [PATCH 1/2] replication: introduce ballot.can_be_leader

Konstantin Osipov kostja.osipov at gmail.com
Wed Jul 21 02:20:57 MSK 2021


* Cyrill Gorcunov <gorcunov at gmail.com> [21/07/21 00:17]:
> > Imagine there are nodes A, B, C, D, E.
> > A is a leader, E is a voter which can not become a leader.
> > 
> > Imagine A's log index is 5, B = 4, C = 3, D = 2, E = 5.
> > 
> > The majority's log index is 4, so entry 4 is committed. A dies, B
> > is partitioned away. The cluster is stuck, because neither C nor B
> > can get a quorum (3 votes).
> > 
> > Worse yet, if E's (voter) commit index is low, not high, it can vote for a
> > node which doesn't have a committed entry. In that case you can
> > lose a committed entry.
> 
> Wait, Kostya, here is a set
> 
>      A  B  C  D  E
>     {5, 4, 3, 2, 5}
>      *  *        *
>      L  F  F  F  V
> 
> where L - leader, F - follower, V - voter, LCI is 4 (least common index),
> Q(uorum) = 3, then
> 
>      A  B  C  D  E
>     {-, -, 3, 2, 5}
>            F  F  V
> 
> The Voter E won't be able to choose C or D because its log
> is bigger and the cluster get stuck (this is guaranteed by
> vclock comparision).

Right. That's what I am saying - the cluster is stuck even though
the quorum (3 nodes) is present. And this is not something
consistent, such clusters will get stuck simply based on the state
of the voter - sometimes they will, sometimes they won't.

> Lets assume the E's index is low, say 3
> 
>     A  B  C  D  E
>    {5, 4, 4, 3, 3}
>     *  *
>     L  F  F  F  V
> 
> in this config the leader won't commit record 5 until one
> of {C,D,E} write the new record(s) since otherwise the quorum
> won't be reached. Assume A and B get out of the set without
> record 4 written to C
> 
>      A  B  C  D  E
>     {-, -, 4, 3, 3}
>            F  F  V
> 
> Now the node E can vote for C and D because its index is LE.
> And since C's index is bigger than others it will be elected
> next as far as I understand, no?

You're right, assuming the voter never casts a vote for a
candidate with a shorter log the safety is not violated. I wasn't
sure it's the case, and presumed that the voter may have no log of
its own. But still there are issues with liveness. Raft PHD has
learners, so why not use them instead.

> The E won't be leader but will
> help C to gather the majority. So the cluster should be safe
> I only I'm not missing something obvious.

-- 
Konstantin Osipov, Moscow, Russia


More information about the Tarantool-patches mailing list