From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <vdavydov.dev@gmail.com>
Date: Tue, 4 Dec 2018 14:25:20 +0300
From: Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [tarantool-patches] Re: [PATCH 9/9] wal: trigger checkpoint if
 there are too many WALs
Message-ID: <20181204112520.2di4acmhts24oj32@esperanza>
References: <cover.1543419109.git.vdavydov.dev@gmail.com>
 <f31fcd1048a64eecac0d979dbfeee248672d217d.1543419109.git.vdavydov.dev@gmail.com>
 <20181203203417.GI2890@chai>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20181203203417.GI2890@chai>
To: Konstantin Osipov <kostja@tarantool.org>
Cc: tarantool-patches@freelists.org
List-ID: <tarantool-patches.dev.tarantool.org>

On Mon, Dec 03, 2018 at 11:34:17PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [18/11/28 19:16]:
> 
> Please avoid using 0 for infinity: Tarantool doesn't use 0 to mean
> anything special.

As a matter of fact, we do - setting checkpoint_interval/count to 0
results in infinite checkpoint interval/count. I want to make
checkpoint_wal_threshold consistent with those configuration options.

Anyway, if 0 doesn't mean infinity, what should one set
checkpoint_wal_threshold to to disable this feature? A very large value?
What value? 100 GB, 100 TB? Would look weird in box.cfg IMO.

> 
> > Closes #1082
> > 
> > @TarantoolBot document
> > Title: Document box.cfg.checkpoint_wal_threshold
> 
> Please document the default value of the new variable. 

OK.

> 
> Please add checks for the range of valid values of the new
> variable, as well as tests for these.

We don't check checkpoint_interval - setting it to a value <= 0 means
infinite timeout. I though why bother about checkpoint_wal_threshold
then?

> 
> > +	int64_t checkpoint_wal_size;
> > +	/**
> > +	 * If greater than 0
> 
> Ugh.
> > + , this variable sets a limit on the
> > +	 * total size of WAL files written since the last checkpoint.
> > +	 * Exceeding it will trigger auto checkpointing in tx.
> > +	 */
> > +	int64_t checkpoint_threshold;
> 
> 
> > +	bool checkpoint_threshold_signalled;
> > +	bool checkpoint_threshold_exceeded;
> 
> If you had the checkpoint object wit hall the messages in
> the wal writer signleton, then the entire checkpoint state,
> including this variable, could be easily observed in a single
> place. Now that I see this flag I'm more inclined to insist
> on having a singleton wal_checkpoint object, inside struct
> wal_writer or standalone.

I'll remove checkpoint_threshold_exceeded and will use a separate
message for this kind of notifications instead of piggybacking on
a WAL request, as we agreed.

Regarding checkpoint_threshold_signalled, quite frankly, I don't think
that introducing a new checkpoint state struct and putting it in there
would make the code look any better. This flag isn't really bound with
checkpointing - it merely indicates whether we've already triggered a
checkpoint while checkpointing may or may not be in progress.

> 
> > +void
> > +wal_set_checkpoint_threshold(int64_t checkpoint_threshold)
> > +{
> > +	struct wal_writer *writer = &wal_writer_singleton;
> > +	if (writer->wal_mode == WAL_NONE)
> > +		return;
> > +	struct wal_set_checkpoint_threshold_msg msg;
> > +	msg.checkpoint_threshold = checkpoint_threshold;
> > +	bool cancellable = fiber_set_cancellable(false);
> > +	cbus_call(&wal_thread.wal_pipe, &wal_thread.tx_prio_pipe,
> > +		  &msg.base, wal_set_checkpoint_threshold_f, NULL,
> > +		  TIMEOUT_INFINITY);
> > +	fiber_set_cancellable(cancellable);
> > +}
> 
> Please add a comment explaining that WAL_NONE is also set when wal
> is not yet initialized.

OK.

> 
> I don't see where you calculate the value of the variable upon
> server start. Did I miss this hunk?

No, it is set by load_cfg.lua, just like box.cfg.checkpoint_interval.