[Tarantool-patches] [PATCH v4 13/12] replication: send accumulated Raft messages after relay start
Serge Petrenko
sergepetrenko at tarantool.org
Tue Apr 20 13:38:46 MSK 2021
20.04.2021 01:36, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
> See 2 comments below.
>
>> diff --git a/src/box/relay.cc b/src/box/relay.cc
>> index 7be33ee31..85f335cd7 100644
>> --- a/src/box/relay.cc
>> +++ b/src/box/relay.cc
>> @@ -628,13 +659,38 @@ struct relay_is_raft_enabled_msg {
>> bool is_finished;
>> };
>>
>> +static void
>> +relay_push_raft_msg(struct relay *relay, bool do_restart_recovery)
> 1. Why is the recovery restart flag is ignored if a message is already
> sent? This might lead to recovery restart loss if I am not mistaken.
I think it's okay. As soon as the message is pushed from relay_push_raft()
rather than from tx_set_is_raft_enabled(), we may freely restart the
recovery.
So, we only care whether do_restart_recovery is set when the message
gets pushed
in the same call.
We don't care whether do_restart_recovery is set or not when the call
exits without pushing
the message. The next call will have the correct value for
do_restart_recovery anyway.
Please see a more detailed explanation below.
>
>> +{
>> + if (!relay->tx.is_raft_enabled || relay->tx.is_raft_push_sent)
>> + return;
>> + struct relay_raft_msg *msg =
>> + &relay->tx.raft_msgs[relay->tx.raft_ready_msg];
>> + msg->do_restart_recovery = do_restart_recovery;
>> + cpipe_push(&relay->relay_pipe, &msg->base);
>> + relay->tx.raft_ready_msg = (relay->tx.raft_ready_msg + 1) % 2;
>> + relay->tx.is_raft_push_sent = true;
>> + relay->tx.is_raft_push_pending = false;
>> +}
>> +
>> /** TX thread part of the Raft flag setting, first hop. */
>> static void
>> tx_set_is_raft_enabled(struct cmsg *base)
>> {
>> struct relay_is_raft_enabled_msg *msg =
>> (struct relay_is_raft_enabled_msg *)base;
>> - msg->relay->tx.is_raft_enabled = msg->value;
>> + struct relay *relay = msg->relay;
>> + relay->tx.is_raft_enabled = msg->value;
>> + /*
>> + * Send saved raft message as soon as relay becomes operational.
>> + * Do not restart recovery upon the message arrival. Recovery is
>> + * positioned at replica_clock initially, i.e. already "restarted" and
>> + * restarting it once again would position it at the oldest xlog
>> + * possible, because relay reader hasn't received replica vclock yet.
>> + */
>> + if (relay->tx.is_raft_push_pending) {
>> + relay_push_raft_msg(msg->relay, false);
> 2. I don't understand. Why wasn't there such a problem before? Recovery
> must be restarted when the node becomes a leader. If you do not restart
> it, the data would be ignored by the replicas. How do you know it is
> positioned right now at replica_clock? You are in tx thread, you can't
> tell. What do I miss?
This is because this `relay_push_raft_msg` is delivered before
`relay_set_is_raft_enabled`.
And both these messages get processed by the cbus_process()
loop waiting for `relay_seet_is_raft_enabled`.
This happens in relay_send_is_raft_enabled() even before
the relay reader fiber is created, so recv_vclock is zero.
Restarting recovery here would lead to it being reset to the
first ever wal this instance has, which's wrong.
Such a problem might've existed before, but was extremely
hard to catch: relay_push_raft_msg() wasn't called until
relay->tx.is_raft_enabled was set. And when tx.is_raft_enabled
was set it most probably meant that relay_set_is_raft_enabled
was already delivered and relay has exited this first
cbus_process() loop, which worked before reader fiber creation.
In order to solve the problem in some another way, I need to
make relay_push_raft_msg() deliver the message to the
second cbus_process() loop, the main one. And I couldn't
come up with an idea how to do that.
The message should be pushed right in tx_set_is_raft_enabled,
and this means it'll get delivered before relay_set_is_raft_enabled.
--
Serge Petrenko
More information about the Tarantool-patches
mailing list