[Tarantool-patches] [PATCH 3/4] raft: introduce split vote detection

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Jan 21 02:02:06 MSK 2022


>>>> diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
>>>> index 289d53fd5..5dcbc7821 100644
>>>> --- a/src/lib/raft/raft.c
>>>> +++ b/src/lib/raft/raft.c
>>>> @@ -152,20 +152,69 @@ raft_can_vote_for(const struct raft *raft, const struct vclock *v)
>>>>        return cmp == 0 || cmp == 1;
>>>>    }
>>>>    -static inline void
>>>> +static inline bool
>>>>    raft_add_vote(struct raft *raft, int src, int dst)
>>>>    {
>>>>        struct raft_vote *v = &raft->votes[src];
>>>>        if (v->did_vote)
>>>> -        return;
>>>> +        return false;
>>>>        v->did_vote = true;
>>>>        ++raft->votes[dst].count;
>>>> +    return true;
>>>> +}
>>>> +
>>> You may check split_vote right in raft_add_vote:
>>> simply track number of votes given in this term and
>>> max votes given for one instance.
>>>
>>> This way you won't have to run over all 32 nodes each time a vote
>>> is casted.
>> I did the fullscan intentionally. Otherwise I need to introduce 2
>> new members to struct raft, keep them up to date, clear on term
>> bump. Too easy to miss something and introduce a bug. While in
>> the current version all the split-vote-specific details are in a
>> single function except for 'raft.votes' member. This thing I couldn't
>> get rid of.
>>
>> As for perf, a couple of ifs or a loop over 32 structs - both would
>> take order of nanoseconds anyway. Here I wouldn't bother. Simplicity
>> matters most.
>>
>> I did the proposal to see how it looks but then discarded as more
>> complex than necessary. However if you think it is still worth doing,
>> tell me and I will re-apply the diff.
> 
> Thanks for trying this out!
> 
> TBH, this diff is exactly what I wanted and it still looks better
> (not only preformance-wise, but simpler as well) in my opinion.
> 
> I understand your point about having to update 2 extra members every
> now and then, so feel free to choose any option you like.

I applied this diff:

====================
diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
index 5dcc5beaf..90ed01ca4 100644
--- a/src/lib/raft/raft.c
+++ b/src/lib/raft/raft.c
@@ -159,36 +159,29 @@ raft_add_vote(struct raft *raft, int src, int dst)
 	if (v->did_vote)
 		return false;
 	v->did_vote = true;
-	++raft->votes[dst].count;
+	++raft->voted_count;
+	int count = ++raft->votes[dst].count;
+	if (count > raft->max_vote)
+		raft->max_vote = count;
 	return true;
 }
 
 static bool
 raft_has_split_vote(const struct raft *raft)
 {
-	int max_vote = 0;
 	int vote_vac = raft->cluster_size;
 	int quorum = raft->election_quorum;
 	/*
 	 * Quorum > cluster is either a misconfiguration or some instances
 	 * didn't register yet. Anyway, speeding the elections up won't help.
+	 * The same when more nodes voted than there are nodes configured.
 	 */
 	if (vote_vac < quorum)
 		return false;
-	for (int i = 0; i < VCLOCK_MAX; ++i) {
-		int count = raft->votes[i].count;
-		vote_vac -= count;
-		if (count > max_vote)
-			max_vote = count;
-	}
-	/*
-	 * More nodes voted than there are nodes configured. The reason is the
-	 * the same as with quorum > cluster. The action is also the same -
-	 * faster term bumps won't help.
-	 */
+	vote_vac -= raft->voted_count;
 	if (vote_vac < 0)
 		return false;
-	return max_vote + vote_vac < quorum;
+	return raft->max_vote + vote_vac < quorum;
 }
 
 static int
@@ -743,6 +736,8 @@ raft_sm_schedule_new_term(struct raft *raft, uint64_t new_term)
 	raft->leader = 0;
 	raft->state = RAFT_STATE_FOLLOWER;
 	memset(raft->votes, 0, sizeof(raft->votes));
+	raft->voted_count = 0;
+	raft->max_vote = 0;
 	/*
 	 * The instance could be promoted for the previous term. But promotion
 	 * has no effect on following terms.
diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h
index 817148792..05e373254 100644
--- a/src/lib/raft/raft.h
+++ b/src/lib/raft/raft.h
@@ -199,6 +199,10 @@ struct raft {
 	uint32_t vote;
 	/** Statistics which node voted for who. */
 	struct raft_vote votes[VCLOCK_MAX];
+	/** How many nodes voted in the current term. */
+	int voted_count;
+	/** Max vote count given to any node in the current term. */
+	int max_vote;
 	/** Number of votes necessary for successful election. */
 	int election_quorum;
 	/**


More information about the Tarantool-patches mailing list