From: Konstantin Osipov <kostja@tarantool.org> To: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: tarantool-patches@freelists.org Subject: Re: [PATCH v3 07/11] replication: rebootstrap instance on startup if it fell behind Date: Thu, 19 Jul 2018 10:19:03 +0300 [thread overview] Message-ID: <20180719071903.GG11373@chai> (raw) In-Reply-To: <64016c63c3727e5df2e4495fe1de52eb8ca5d2eb.1531598427.git.vdavydov.dev@gmail.com> * Vladimir Davydov <vdavydov.dev@gmail.com> [18/07/14 23:50]: > diff --git a/src/box/box.cc b/src/box/box.cc > index b629a4d8..baf30fce 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -1797,6 +1797,9 @@ bootstrap(const struct tt_uuid *instance_uuid, > /** > * Recover the instance from the local directory. > * Enter hot standby if the directory is locked. > + * Invoke rebootstrap if the instance fell too much > + * behind its peers in the replica set and needs > + * to be rebootstrapped. > */ > static void > local_recovery(const struct tt_uuid *instance_uuid, > @@ -1832,6 +1835,12 @@ local_recovery(const struct tt_uuid *instance_uuid, > if (wal_dir_lock >= 0) { > box_listen(); > box_sync_replication(replication_connect_timeout, false); > + > + struct replica *master; > + if (replicaset_needs_rejoin(&master)) { > + say_info("replica is too old, initiating rejoin"); > + return bootstrap_from_master(master); This is say_crit() IMHO. > +bool > +replicaset_needs_rejoin(struct replica **master) > +{ > + replicaset_foreach(replica) { > + /* > + * Rebootstrap this instance from a master if: > + * - the oldest vclock stored on the master is greater > + * than or incomparable with the instance vclock > + * (so that the instance can't follow the master) and > + * - the instance is strictly behind the master (so > + * that we won't lose any data by rebootstrapping > + * this instance) > + */ > + struct applier *applier = replica->applier; > + if (applier != NULL && > + vclock_compare(&applier->remote_status.gc_vclock, > + &replicaset.vclock) > 0 && > + vclock_compare(&replicaset.vclock, > + &applier->remote_status.vclock) < 0) { > + *master = replica; > + return true; I'd love to see a bit more clarity in the log about this decision making process. Imagine this function returns 'false' because vclocks are incomparable and then replication breaks - it would be very hard to diagnose why this happened. You could add some logging to this function, but this would change its contract, since currently this function has no side effects. Should it set the diagnostics area in case of error? Log the error? Return an extra status code? Please feel free to choose the option you think is best. Thank you for working on this, -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov
next prev parent reply other threads:[~2018-07-19 7:19 UTC|newest] Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-07-14 20:49 [PATCH v3 00/11] Replica rejoin Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 01/11] recovery: clean up WAL dir scan code Vladimir Davydov 2018-07-19 7:08 ` Konstantin Osipov 2018-07-14 20:49 ` [PATCH v3 02/11] xrow: factor out function for decoding vclock Vladimir Davydov 2018-07-19 7:08 ` Konstantin Osipov 2018-07-14 20:49 ` [PATCH v3 03/11] Introduce IPROTO_REQUEST_STATUS command Vladimir Davydov 2018-07-19 7:10 ` Konstantin Osipov 2018-07-19 8:17 ` Vladimir Davydov 2018-07-21 10:25 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 04/11] Get rid of IPROTO_SERVER_IS_RO Vladimir Davydov 2018-07-19 7:10 ` Konstantin Osipov 2018-07-21 12:07 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 05/11] gc: keep track of vclocks instead of signatures Vladimir Davydov 2018-07-19 7:11 ` Konstantin Osipov 2018-07-14 20:49 ` [PATCH v3 06/11] Include oldest vclock available on the instance in IPROTO_STATUS Vladimir Davydov 2018-07-19 7:12 ` Konstantin Osipov 2018-07-21 12:07 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 07/11] replication: rebootstrap instance on startup if it fell behind Vladimir Davydov 2018-07-19 7:19 ` Konstantin Osipov [this message] 2018-07-19 10:04 ` Vladimir Davydov 2018-07-23 20:19 ` Konstantin Osipov 2018-07-27 16:13 ` [PATCH] replication: print master uuid when (re)bootstrapping Vladimir Davydov 2018-07-31 8:34 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 08/11] vinyl: simplify vylog recovery from backup Vladimir Davydov 2018-07-31 8:21 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 09/11] vinyl: pass flags to vy_recovery_new Vladimir Davydov 2018-07-21 11:12 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 10/11] Update test-run Vladimir Davydov 2018-07-21 11:13 ` Vladimir Davydov 2018-07-14 20:49 ` [PATCH v3 11/11] vinyl: implement rebootstrap support Vladimir Davydov 2018-07-31 8:23 ` Vladimir Davydov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180719071903.GG11373@chai \ --to=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --cc=vdavydov.dev@gmail.com \ --subject='Re: [PATCH v3 07/11] replication: rebootstrap instance on startup if it fell behind' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox