From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH 2/2] replication: fix replica disconnect upon reconfiguration Date: Tue, 5 Oct 2021 16:09:40 +0300 [thread overview] Message-ID: <c49aa8ae-14cf-1d02-e53f-2d9798893839@tarantool.org> (raw) In-Reply-To: <a7916487-3540-162f-8f1f-40de94869de3@tarantool.org> 05.10.2021 00:04, Vladislav Shpilevoy пишет: > Hi! Thanks for working on this! > >>>> diff --git a/src/box/box.cc b/src/box/box.cc >>>> index 219ffa38d..89cda5599 100644 >>>> --- a/src/box/box.cc >>>> +++ b/src/box/box.cc> @@ -1261,7 +1261,9 @@ box_sync_replication(bool connect_quorum) >>>> applier_delete(appliers[i]); /* doesn't affect diag */ >>>> }); >>>> - replicaset_connect(appliers, count, connect_quorum); >>>> + bool connect_quorum = strict; >>>> + bool keep_connect = !strict; >>>> + replicaset_connect(appliers, count, connect_quorum, keep_connect); >>> 1. How about passing both these parameters explicitly to box_sync_replication? >>> I don't understand the link between them so that they could be one. >>> >>> It seems the only case when you need to drop the old connections is when >>> you turn anon to normal. Why should they be fully reset otherwise? >> Yes, it's true. anon to normal is the only place where existing >> connections should be reset. >> >> For both bootstrap and local recovery (first ever box.cfg) keep_connect >> doesn't make sense at all, because there are no previous connections to >> keep. >> >> So the only two (out of 5) box_sync_replication calls, that need >> keep_connect are replication reconfiguration (keep_connect = true) and >> anon replica reconfiguration (keep_connect = false). >> >> Speaking of the relation between keep_connect and connect_quorum: >> We don't care about keep_connect in 3 calls (bootstrap and recovery), >> and when keep_connect is important, it's equal to !connect_quorum. >> I thought it might be nice to replace them with a single parameter. >> >> I tried to pass both parameters to box_sync_repication() at first. >> This looked rather ugly IMO: >> box_sync_replication(true, false), box_sync_replication(false, true); >> Two boolean parameters which are responsible for God knows what are >> worse than one parameter. >> >> I'm not 100% happy with my solution, but it at least hides the second >> parameter. And IMO box_sync_replication(strict) is rather easy to >> understand: when strict = true, you want to connect to quorum, and >> you want to reset the connections. And vice versa when strict = false. > This can be resolved with a couple of wrappers, like in this diff: > > ==================== > diff --git a/src/box/box.cc b/src/box/box.cc > index 89cda5599..c1216172d 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -1249,7 +1249,7 @@ cfg_get_replication(int *p_count) > * don't start appliers. > */ > static void > -box_sync_replication(bool strict) > +box_sync_replication(bool do_quorum, bool do_reuse) > { > int count = 0; > struct applier **appliers = cfg_get_replication(&count); > @@ -1260,14 +1260,27 @@ box_sync_replication(bool strict) > for (int i = 0; i < count; i++) > applier_delete(appliers[i]); /* doesn't affect diag */ > }); > - > - bool connect_quorum = strict; > - bool keep_connect = !strict; > - replicaset_connect(appliers, count, connect_quorum, keep_connect); > + replicaset_connect(appliers, count, do_quorum, do_reuse); > > guard.is_active = false; > } > > +static inline void > +box_reset_replication(void) > +{ > + const bool do_quorum = true; > + const bool do_reuse = false; > + box_sync_replication(do_quorum, do_reuse); > +} > + > +static inline void > +box_update_replication(void) > +{ > + const bool do_quorum = false; > + const bool do_reuse = true; > + box_sync_replication(do_quorum, do_reuse); > +} > + > void > box_set_replication(void) > { > @@ -1286,7 +1299,7 @@ box_set_replication(void) > * Stay in orphan mode in case we fail to connect to at least > * 'replication_connect_quorum' remote instances. > */ > - box_sync_replication(false); > + box_update_replication(); > /* Follow replica */ > replicaset_follow(); > /* Wait until appliers are in sync */ > @@ -1406,7 +1419,7 @@ box_set_replication_anon(void) > * them can register and others resend a > * non-anonymous subscribe. > */ > - box_sync_replication(true); > + box_reset_replication(); > /* > * Wait until the master has registered this > * instance. > @@ -3260,7 +3273,7 @@ bootstrap(const struct tt_uuid *instance_uuid, > * with connecting to 'replication_connect_quorum' masters. > * If this also fails, throw an error. > */ > - box_sync_replication(true); > + box_update_replication(); > > struct replica *master = replicaset_find_join_master(); > assert(master == NULL || master->applier != NULL); > @@ -3337,7 +3350,7 @@ local_recovery(const struct tt_uuid *instance_uuid, > if (wal_dir_lock >= 0) { > if (box_listen() != 0) > diag_raise(); > - box_sync_replication(false); > + box_update_replication(); > > struct replica *master; > if (replicaset_needs_rejoin(&master)) { > @@ -3416,7 +3429,7 @@ local_recovery(const struct tt_uuid *instance_uuid, > vclock_copy(&replicaset.vclock, &recovery->vclock); > if (box_listen() != 0) > diag_raise(); > - box_sync_replication(false); > + box_update_replication(); > } > stream_guard.is_active = false; > recovery_finalize(recovery); > ==================== > > Feel free to discard it if don't like. I am fine with the current > solution too. > > Now when I sent this diff, I realized box_restart_replication() > would be a better name than reset. Up to you as well. Your version looks better, thanks! Applied with renaming box_reset_replication() to box_restart_replication() Also replaced box_update_replication() with box_restart_replication() in bootstrap(). > >> diff --git a/test/instance_files/base_instance.lua b/test/instance_files/base_instance.lua >> index 45bdbc7e8..e579c3843 100755 >> --- a/test/instance_files/base_instance.lua >> +++ b/test/instance_files/base_instance.lua >> @@ -5,7 +5,8 @@ local listen = os.getenv('TARANTOOL_LISTEN') >> box.cfg({ >> work_dir = workdir, >> -- listen = 'localhost:3310' >> - listen = listen >> + listen = listen, >> + log = workdir..'/tarantool.log', > Do you really need it in this patch? Yep, I need it for grep_log. Looks like luatest doesn't set log to anything by default. > > Other than that LGTM. You can send the next version to a next > reviewer. I suppose it can be Yan now. Here's the full diff: ================================== diff --git a/src/box/box.cc b/src/box/box.cc index 89cda5599..cc4ada47e 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1249,7 +1249,7 @@ cfg_get_replication(int *p_count) * don't start appliers. */ static void -box_sync_replication(bool strict) +box_sync_replication(bool do_quorum, bool do_reuse) { int count = 0; struct applier **appliers = cfg_get_replication(&count); @@ -1260,14 +1260,27 @@ box_sync_replication(bool strict) for (int i = 0; i < count; i++) applier_delete(appliers[i]); /* doesn't affect diag */ }); - - bool connect_quorum = strict; - bool keep_connect = !strict; - replicaset_connect(appliers, count, connect_quorum, keep_connect); + replicaset_connect(appliers, count, do_quorum, do_reuse); guard.is_active = false; } +static inline void +box_restart_replication(void) +{ + const bool do_quorum = true; + const bool do_reuse = false; + box_sync_replication(do_quorum, do_reuse); +} + +static inline void +box_update_replication(void) +{ + const bool do_quorum = false; + const bool do_reuse = true; + box_sync_replication(do_quorum, do_reuse); +} + void box_set_replication(void) { @@ -1286,7 +1299,7 @@ box_set_replication(void) * Stay in orphan mode in case we fail to connect to at least * 'replication_connect_quorum' remote instances. */ - box_sync_replication(false); + box_update_replication(); /* Follow replica */ replicaset_follow(); /* Wait until appliers are in sync */ @@ -1406,7 +1419,7 @@ box_set_replication_anon(void) * them can register and others resend a * non-anonymous subscribe. */ - box_sync_replication(true); + box_restart_replication(); /* * Wait until the master has registered this * instance. @@ -3260,7 +3273,7 @@ bootstrap(const struct tt_uuid *instance_uuid, * with connecting to 'replication_connect_quorum' masters. * If this also fails, throw an error. */ - box_sync_replication(true); + box_restart_replication(); struct replica *master = replicaset_find_join_master(); assert(master == NULL || master->applier != NULL); @@ -3337,7 +3350,7 @@ local_recovery(const struct tt_uuid *instance_uuid, if (wal_dir_lock >= 0) { if (box_listen() != 0) diag_raise(); - box_sync_replication(false); + box_update_replication(); struct replica *master; if (replicaset_needs_rejoin(&master)) { @@ -3416,7 +3429,7 @@ local_recovery(const struct tt_uuid *instance_uuid, vclock_copy(&replicaset.vclock, &recovery->vclock); if (box_listen() != 0) diag_raise(); - box_sync_replication(false); + box_update_replication(); } stream_guard.is_active = false; recovery_finalize(recovery); diff --git a/test/replication-luatest/gh-4669-applier-reconnect_test.lua b/test/replication-luatest/gh_4669_applier_reconnect_test.lua similarity index 100% rename from test/replication-luatest/gh-4669-applier-reconnect_test.lua rename to test/replication-luatest/gh_4669_applier_reconnect_test.lua ================================== -- Serge Petrenko
next prev parent reply other threads:[~2021-10-05 13:09 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-30 19:17 [Tarantool-patches] [PATCH 0/2] replication: fix reconnect on box.cfg.replication change Serge Petrenko via Tarantool-patches 2021-09-30 19:17 ` [Tarantool-patches] [PATCH 1/2] replicaiton: make anon replica connect to quorum upon reconfiguration Serge Petrenko via Tarantool-patches 2021-09-30 19:17 ` [Tarantool-patches] [PATCH 2/2] replication: fix replica disconnect " Serge Petrenko via Tarantool-patches 2021-09-30 22:15 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-01 11:31 ` Serge Petrenko via Tarantool-patches 2021-10-04 21:04 ` Vladislav Shpilevoy via Tarantool-patches 2021-10-05 13:09 ` Serge Petrenko via Tarantool-patches [this message] 2021-10-06 21:59 ` Vladislav Shpilevoy via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=c49aa8ae-14cf-1d02-e53f-2d9798893839@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 2/2] replication: fix replica disconnect upon reconfiguration' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox