[Tarantool-patches] [PATCH 1/3] raft: send state to new subscribers if Raft worked

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Thu Oct 22 00:41:52 MSK 2020


Hi! Thanks for the review!

>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 18568df3b..86daaf626 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>>   @@ -2154,13 +2155,14 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
>>            tt_uuid_str(&replica_uuid), sio_socketname(io->fd));
>>       say_info("remote vclock %s local vclock %s",
>>            vclock_to_string(&replica_clock), vclock_to_string(&vclock));
>> -    if (raft_is_enabled()) {
>> +    if (replica_version_id >= version_id(2, 6, 0)) {
> 
> Why don't you use 'raft_was_used()` here together with version check?

It stops making sense. Its purpose was not to send anything while there can be
old versions. Assuming that in this case Raft was never enabled. After we
start checking subscriber's version, we can send Raft always when it is
supported.

> I've just understood we should also check for relay version in relay_push_raft(), shouldn't we?

Perhaps, yeah.

The patch is dropped from the patchset, and has forked into a new issue:
https://github.com/tarantool/tarantool/issues/5438

Main reason is that the patch breaks the tests. The problem is that when
Raft is enabled and delivered to a subscriber, it bumps its local LSN. But
after that the subscriber may get XlogGapError from master, if some of
master logs were deleted. And the follower won't be able to auto-rejoin, because
local LSN is > 0 already - Raft wrote something to WAL.

That breaks replication/replica_rejoin.test.lua. Not sure what to do with
that yet. Auto-rejoin in a cluster with election is not so vital I think, so
we can work on it in the next release. And for now not to send anything
when Raft is off. It is bearable.


More information about the Tarantool-patches mailing list