From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kostja.osipov@gmail.com>
Return-Path: <kostja.osipov@gmail.com>
Date: Mon, 26 Nov 2018 20:50:37 +0300
From: Konstantin Osipov <kostja.osipov@gmail.com>
Subject: Re: [PATCH 3/6] box: do not rotate WAL when replica subscribes
Message-ID: <20181126175037.GD7839@chai>
References: <cover.1543152574.git.vdavydov.dev@gmail.com>
 <51805fb0af1b8192c8226348bb46683b14ead224.1543152574.git.vdavydov.dev@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <51805fb0af1b8192c8226348bb46683b14ead224.1543152574.git.vdavydov.dev@gmail.com>
To: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: tarantool-patches@freelists.org
List-ID: <tarantool-patches.dev.tarantool.org>

* Vladimir Davydov <vdavydov.dev@gmail.com> [18/11/26 10:27]:

OK to push.
> Because this is pointless and confusing. This "feature" was silently
> introduced by commit f2bccc18485d ("Use WAL vclock instead of TX vclock
> in most places"). Let's revert this change. This will allow us to
> clearly separate WAL checkpointing from WAL flushing, which will in turn
> facilitate implementation of the checkpoint-on-WAL-threshold feature.
> 
> There are two problems here, however. First, not rotating the log breaks
> expectations of replication/gc test: an xlog file doesn't get deleted in
> time as a consequence. This happens, because we don't delete xlogs
> relayed to a replica after join stage is complete - we only do it during
> subscribe stage - and if we don't rotate WAL on subscribe the garbage
> collector won't be invoked. This is actually a bug - we should advance
> the WAL consumer associated with a replica once join stage is complete.
> This patch fixes it, but it unveils another problem - this time in the
> WAL garbage collection procedure.
> 
> Turns out, when passed a vclock, the WAL garbage collection procedure
> removes all WAL files that were created before the vclock. Apparently,
> this isn't quite correct - if a consumer is in the middle of a WAL file,
> we must not delete the WAL file, but we do. This works as long as
> consumers never track vlcocks inside WAL files - currently they are
> advanced only when a WAL file is closed and naturally they are advanced
> to the beginning of the next WAL file. However, if we want to advance
> the consumer associated with a replica when join stage ends (this is
> what the previous paragraph is about), it might occur that we will
> advance it to the middle of a WAL file. If that happens the WAL garbage
> collector might remove a file which is actually in use by a replica.
> Fix this as well.

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov