From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Return-Path: Date: Mon, 26 Nov 2018 20:50:37 +0300 From: Konstantin Osipov Subject: Re: [PATCH 3/6] box: do not rotate WAL when replica subscribes Message-ID: <20181126175037.GD7839@chai> References: <51805fb0af1b8192c8226348bb46683b14ead224.1543152574.git.vdavydov.dev@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51805fb0af1b8192c8226348bb46683b14ead224.1543152574.git.vdavydov.dev@gmail.com> To: Vladimir Davydov Cc: tarantool-patches@freelists.org List-ID: * Vladimir Davydov [18/11/26 10:27]: OK to push. > Because this is pointless and confusing. This "feature" was silently > introduced by commit f2bccc18485d ("Use WAL vclock instead of TX vclock > in most places"). Let's revert this change. This will allow us to > clearly separate WAL checkpointing from WAL flushing, which will in turn > facilitate implementation of the checkpoint-on-WAL-threshold feature. > > There are two problems here, however. First, not rotating the log breaks > expectations of replication/gc test: an xlog file doesn't get deleted in > time as a consequence. This happens, because we don't delete xlogs > relayed to a replica after join stage is complete - we only do it during > subscribe stage - and if we don't rotate WAL on subscribe the garbage > collector won't be invoked. This is actually a bug - we should advance > the WAL consumer associated with a replica once join stage is complete. > This patch fixes it, but it unveils another problem - this time in the > WAL garbage collection procedure. > > Turns out, when passed a vclock, the WAL garbage collection procedure > removes all WAL files that were created before the vclock. Apparently, > this isn't quite correct - if a consumer is in the middle of a WAL file, > we must not delete the WAL file, but we do. This works as long as > consumers never track vlcocks inside WAL files - currently they are > advanced only when a WAL file is closed and naturally they are advanced > to the beginning of the next WAL file. However, if we want to advance > the consumer associated with a replica when join stage ends (this is > what the previous paragraph is about), it might occur that we will > advance it to the middle of a WAL file. If that happens the WAL garbage > collector might remove a file which is actually in use by a replica. > Fix this as well. -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov