Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, gorcunov@gmail.com
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry
Date: Sat, 27 Mar 2021 19:52:26 +0300	[thread overview]
Message-ID: <a970ecab-3f50-4688-dd0e-842524c89c1c@tarantool.org> (raw)
In-Reply-To: <2ab0844e-84b4-6701-15ab-652ab6f18075@tarantool.org>



26.03.2021 23:44, Vladislav Shpilevoy пишет:
> Hi! Thanks for working on this!
>
>> diff --git a/src/box/applier.cc b/src/box/applier.cc
>> index 5a88a013e..326cf18d2 100644
>> --- a/src/box/applier.cc
>> +++ b/src/box/applier.cc
>> @@ -566,9 +566,16 @@ applier_register(struct applier *applier)
>>   	row.type = IPROTO_REGISTER;
>>   	coio_write_xrow(coio, &row);
>>   
>> -	applier_set_state(applier, APPLIER_REGISTER);
>> +	/*
>> +	 * Register may serve as a retry for final join. Set corresponding
>> +	 * states to unblock anyone who's waiting for final join to start or
>> +	 * end.
>> +	 */
>> +	applier_set_state(applier, was_anon ? APPLIER_REGISTER :
>> +					      APPLIER_FINAL_JOIN);
>>   	applier_wait_register(applier, 0);
>> -	applier_set_state(applier, APPLIER_REGISTERED);
>> +	applier_set_state(applier, was_anon ? APPLIER_REGISTERED :
>> +					      APPLIER_JOINED);
>>   	applier_set_state(applier, APPLIER_READY);
> Hm. I don't understand. Transition from anon to non-anon leads to
> re-creation of all appliers. It calls box_sync_replication() and
> creates new struct applier objects. How is it possible that during one
> life of a reader fiber it manages to see 2 states and is not terminated?

You're correct. This isn't possible for an applier to see two states, 
anon and not anon.
The flag is still needed though for the case when a normal replica 
receives some transient
error during final join. In this case applier reconnects and we get to 
the next applier loop
iteration. First it checks whether REPLICASET_UUID is nil. It isn't, 
because initial join succeeded.
Then it checks whether instance_id is 0. It is, because final join failed.
Applier now assumes that the replica was anonymous and tries to register.

The hang I'm talking about is in `bootstrap_from_master()`. It waits 
until applier enters
APPLIER_JOINED state, which never happened before this patch.

So, `was_anon` comes in play only when final join fails and is retried.

>
> Also could you please provide a test? Maybe it would be easier to see
> what is happening then.

Ok. I'm not sure this test is needed because this is implicitly tested 
in gh-5566-final-join-synchro test.

A test would be as follows:
master:
     box.cfg{listen=3301, replication_synchro_quorum=10}
     box.space._cluster:alter{is_sync=true}
     box.schema.user.grant("guest", "replication")
replica:
     box.cfg{replication=3301}
master: wait until replica receives ER_SYNC_QUORUM_TIMEOUT, and then:
     box.cfg{replication_synchro_quorum=1}

This test passes on the branch, meaning replica's box.cfg completes 
successfully,
but it would hang indefinitely without this commit.

-- 
Serge Petrenko


  reply	other threads:[~2021-03-27 16:52 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-24 12:24 [Tarantool-patches] [PATCH v2 0/7] applier: handle synchronous transactions during final Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry Serge Petrenko via Tarantool-patches
2021-03-26 20:44   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-27 16:52     ` Serge Petrenko via Tarantool-patches [this message]
2021-03-29 21:50       ` Vladislav Shpilevoy via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 2/7] applier: extract tx boundary checks from applier_read_tx into a separate routine Serge Petrenko via Tarantool-patches
2021-03-26 12:35   ` Cyrill Gorcunov via Tarantool-patches
2021-03-27 16:54     ` Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 3/7] applier: extract plain tx application from applier_apply_tx() Serge Petrenko via Tarantool-patches
2021-03-26 20:47   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-27 17:34     ` Serge Petrenko via Tarantool-patches
2021-03-27 18:30   ` [Tarantool-patches] [PATCH v2 3.5/7] applier: fix not releasing the latch on apply_synchro_row() fail Serge Petrenko via Tarantool-patches
2021-03-29 21:50     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-30  8:15       ` Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 4/7] applier: remove excess last_row_time update from subscribe loop Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 5/7] applier: make final join transactional Serge Petrenko via Tarantool-patches
2021-03-26 20:49   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-27 19:05     ` Serge Petrenko via Tarantool-patches
2021-03-29 21:51       ` Vladislav Shpilevoy via Tarantool-patches
2021-03-30  8:15         ` Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 6/7] replication: tolerate synchro rollback during final join Serge Petrenko via Tarantool-patches
2021-03-24 12:45   ` Serge Petrenko via Tarantool-patches
2021-03-26 20:49   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-27 19:23     ` Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 7/7] replication: do not ignore replica vclock on register Serge Petrenko via Tarantool-patches
2021-03-26 20:50   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-27 20:13     ` Serge Petrenko via Tarantool-patches
2021-03-29 21:51       ` Vladislav Shpilevoy via Tarantool-patches
2021-03-30  8:16         ` Serge Petrenko via Tarantool-patches
2021-03-30 12:33       ` Serge Petrenko via Tarantool-patches
2021-03-26 13:46 ` [Tarantool-patches] [PATCH v2 0/7] applier: handle synchronous transactions during final Cyrill Gorcunov via Tarantool-patches
2021-03-30 20:13 ` Vladislav Shpilevoy via Tarantool-patches
2021-04-05 16:15 ` Kirill Yukhin via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a970ecab-3f50-4688-dd0e-842524c89c1c@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox