[PATCH 3/6] box: do not rotate WAL when replica subscribes

Vladimir Davydov vdavydov.dev at gmail.com
Tue Nov 27 12:57:00 MSK 2018


On Mon, Nov 26, 2018 at 08:50:37PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev at gmail.com> [18/11/26 10:27]:
> 
> OK to push.
> > Because this is pointless and confusing. This "feature" was silently
> > introduced by commit f2bccc18485d ("Use WAL vclock instead of TX vclock
> > in most places"). Let's revert this change. This will allow us to
> > clearly separate WAL checkpointing from WAL flushing, which will in turn
> > facilitate implementation of the checkpoint-on-WAL-threshold feature.
> > 
> > There are two problems here, however. First, not rotating the log breaks
> > expectations of replication/gc test: an xlog file doesn't get deleted in
> > time as a consequence. This happens, because we don't delete xlogs
> > relayed to a replica after join stage is complete - we only do it during
> > subscribe stage - and if we don't rotate WAL on subscribe the garbage
> > collector won't be invoked. This is actually a bug - we should advance
> > the WAL consumer associated with a replica once join stage is complete.
> > This patch fixes it, but it unveils another problem - this time in the
> > WAL garbage collection procedure.
> > 
> > Turns out, when passed a vclock, the WAL garbage collection procedure
> > removes all WAL files that were created before the vclock. Apparently,
> > this isn't quite correct - if a consumer is in the middle of a WAL file,
> > we must not delete the WAL file, but we do. This works as long as
> > consumers never track vlcocks inside WAL files - currently they are
> > advanced only when a WAL file is closed and naturally they are advanced
> > to the beginning of the next WAL file. However, if we want to advance
> > the consumer associated with a replica when join stage ends (this is
> > what the previous paragraph is about), it might occur that we will
> > advance it to the middle of a WAL file. If that happens the WAL garbage
> > collector might remove a file which is actually in use by a replica.
> > Fix this as well.

Pushed to 2.1



More information about the Tarantool-patches mailing list