From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 27 Nov 2018 12:57:00 +0300 From: Vladimir Davydov Subject: Re: [PATCH 3/6] box: do not rotate WAL when replica subscribes Message-ID: <20181127095700.hzsiirpb2pg6ncau@esperanza> References: <51805fb0af1b8192c8226348bb46683b14ead224.1543152574.git.vdavydov.dev@gmail.com> <20181126175037.GD7839@chai> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181126175037.GD7839@chai> To: Konstantin Osipov Cc: tarantool-patches@freelists.org List-ID: On Mon, Nov 26, 2018 at 08:50:37PM +0300, Konstantin Osipov wrote: > * Vladimir Davydov [18/11/26 10:27]: > > OK to push. > > Because this is pointless and confusing. This "feature" was silently > > introduced by commit f2bccc18485d ("Use WAL vclock instead of TX vclock > > in most places"). Let's revert this change. This will allow us to > > clearly separate WAL checkpointing from WAL flushing, which will in turn > > facilitate implementation of the checkpoint-on-WAL-threshold feature. > > > > There are two problems here, however. First, not rotating the log breaks > > expectations of replication/gc test: an xlog file doesn't get deleted in > > time as a consequence. This happens, because we don't delete xlogs > > relayed to a replica after join stage is complete - we only do it during > > subscribe stage - and if we don't rotate WAL on subscribe the garbage > > collector won't be invoked. This is actually a bug - we should advance > > the WAL consumer associated with a replica once join stage is complete. > > This patch fixes it, but it unveils another problem - this time in the > > WAL garbage collection procedure. > > > > Turns out, when passed a vclock, the WAL garbage collection procedure > > removes all WAL files that were created before the vclock. Apparently, > > this isn't quite correct - if a consumer is in the middle of a WAL file, > > we must not delete the WAL file, but we do. This works as long as > > consumers never track vlcocks inside WAL files - currently they are > > advanced only when a WAL file is closed and naturally they are advanced > > to the beginning of the next WAL file. However, if we want to advance > > the consumer associated with a replica when join stage ends (this is > > what the previous paragraph is about), it might occur that we will > > advance it to the middle of a WAL file. If that happens the WAL garbage > > collector might remove a file which is actually in use by a replica. > > Fix this as well. Pushed to 2.1