From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 28 Aug 2018 19:36:12 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] [PATCH v4 2/2] box: adds replication sync after cfg. update Message-ID: <20180828163612.3xxc6fxrb6glir57@esperanza> References: <20180828114328.25702-1-krishtal.olja@gmail.com> <20180828114328.25702-2-krishtal.olja@gmail.com> <20180828155804.j6uqt2gobzejpa6m@esperanza> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: To: Olga Krishtal Cc: tarantool-patches@freelists.org List-ID: On Tue, Aug 28, 2018 at 07:19:55PM +0300, Olga Krishtal wrote: > > > diff --git a/src/box/box.cc b/src/box/box.cc > > > index be5077da8..aaae4219f 100644 > > > --- a/src/box/box.cc > > > +++ b/src/box/box.cc > > > @@ -634,6 +634,11 @@ box_set_replication(void) > > > box_sync_replication(true); > > > /* Follow replica */ > > > replicaset_follow(); > > > + /* Sync replica up to quorum */ > > > + if (!replicaset_sync()) { > > > + tnt_raise(ClientError, ER_CFG, "replication", > > > + "failed to connect to one or more replicas"); > > > + } > > > > Throwing ER_CFG error from box.cfg() and still applying the new > > replication configuration looks weird. We should either revert the > > configuration back to what we had before box.cfg() was called or not > > throw exceptions. > > > > Reverting configuration seems to be unreasonable, because we could've > > applied some rows from the new replicas. > > > > We discussed the matter with Georgy and Kostja and agreed that instead > > an instance should enter the orphan mode, just like it does on initial > > configuration. > > > > > > Just curious, why? How can we applied changes if box.cfg throws an error? > Or I miss smth? box.cfg{replication} successfully connected appliers and updated replica set. Then it started to wait for appliers to synchronize. Some appliers synchronized. Others not. Failure. We can't revert to old replication configuration, because the appliers applied some rows from the new configuration. That's why we should return from box.cfg{} without an error, but enter the orphan mode. > > > +a = 3000 box.begin() while a > 0 do a = a-1 box.space.test:insert{a,a} > > end box.commit() > > > > Nit: for i = 1, 100 do ... end > > > > Anyway, why 3000? When I change it to 1000 or even 100 the test still > > passes with this patch and fails without it. > > > > > I used 3000 because when there is no patch and I put replica into sleep for > replication sync lag (0.1) arrives nearly 2500 tuples. Better use error injections. > > > > > Also, I'd like to see a test case that checks that in case > > box.cfg.replication_sync_lag is big, not all records arrive > > by the time box.cfg{replication} returns. > > > > > You mean see difference in tuples count in case when replicas are synced, > however due to lag, but not due to data has arrived? Yes, box.cfg{} returns, because the lag is within the target range, even though not all rows have arrived. > So I need 3 test case > Test that we are synced. > Test with sync and big lag. > Test with failed sync - orphan mode? Correct.