[tarantool-patches] Re: [PATCH 7/9] box: rewrite checkpoint daemon in C

Vladimir Davydov vdavydov.dev at gmail.com
Wed Dec 5 19:21:19 MSK 2018


Discussed verbally. For the record, see below.

On Fri, Nov 30, 2018 at 12:41:41PM +0300, Vladimir Davydov wrote:
> On Fri, Nov 30, 2018 at 11:58:10AM +0300, Konstantin Osipov wrote:
> > * Vladimir Davydov <vdavydov.dev at gmail.com> [18/11/28 19:16]:
> > > Long time ago, when the checkpoint daemon was added to Tarantool, it was
> > > responsible not only for making periodic checkpoints, but also for
> > > maintaining the configured number of checkpoints and removing old snap
> > > and xlog times, so it was much easier to implement it in Lua than in C.
> > files 
> > > However, over time, all its responsibilities have been reimplemented in
> > > C and moved to the server code so that now it just calls box.snapshot()
> > > periodically. Let's rewrite this simple procedure in C as well - this
> > > will allow us to easily add more complex logic there, e.g. triggering
> > > checkpoint when WAL files exceed a configured threshold.
> > > ---
> > >  src/box/CMakeLists.txt               |   1 -
> > >  src/box/box.cc                       | 102 ++++++++++++++++++++++++
> > >  src/box/box.h                        |   1 +
> > >  src/box/lua/cfg.cc                   |  12 +++
> > >  src/box/lua/checkpoint_daemon.lua    | 136 --------------------------------
> > >  src/box/lua/init.c                   |   2 -
> > >  src/box/lua/load_cfg.lua             |   2 +-
> > >  test/xlog/checkpoint_daemon.result   | 145 -----------------------------------
> > 
> > Could we please move this fiber to gc.c at least? Let's not
> > pollute box?
> 
> If we moved it to gc.c, there would be a cross-dependency between gc.c
> and box.cc - the daemon needs box_checkpoint defined in box.cc while
> box.cc needs gc methods to register consumers. One way to solve this
> problem is move box_checkpoint to gc.c, however, it would look weird
> IMO as checkpointing isn't really a part of garbage collection. I guess
> we have to introduce a separate file for the checkpoint daemon and
> box_checkpoint after all.

Agreed to move checkpoint daemon to gc.c. In order not to introduce
cross-reference between box.cc and gc.c, gc_make_checkpoint() will be
introduced. The new method will do the work currently done by
box_checkpoint(), namely call engine and WAL checkpoint callbacks.

> 
> > 
> > > +		if (box_checkpoint_is_in_progress) {
> > > +			/*
> > > +			 * The next checkpoint will be scheduled
> > > +			 * by the concurrent box_checkpoint().
> > > +			 */
> > 
> > This is rather fragile. Can we make the interaction between
> > box-checkpoint and checkpoint daemon less obvious? Imagine in
> > future we can perform checkpoints according to a cron. 
> > 
> > I think it would be better if the logic of next checkpoint time
> > calculation is consolidated in the checkpoint daemon alone. If the
> > daemon sees that a checkpoint is in progress it can skip the
> > current checkpoint, but not delay the next checkpoint till
> > infinity.
> 
> But then box_checkpoint triggered by exceeding checkpoint_wal_threshold
> wouldn't reset the timer set by the checkpoint daemon and the time
> interval between checkpoints could be less than configured at times,
> which would probably raise questions. Would it be OK? Or we'd better
> notify the checkpoint daemon about checkpoints done out of schedule (by
> the user or on exceeding checkpoin_wal_threshold)?

We must notify the checkpoint daemon to reschedule the next checkpoint
whenever the user makes a checkpoint with box.snapshot(). That is, there
should be gc_schedule_checkpoint() helper that will schedule the next
checkpoint. This method will be used both by the checkpoint daemon and
by box.snapshot() in order to schedule the next checkpoint according to
box.cfg.checkpoint_interval.

> 
> > 
> > > +			next_checkpoint_time = now + TIMEOUT_INFINITY;
> > > +			continue;
> > > +		}
> > > +		box_checkpoint();
> > > +	}
> > > +	checkpoint_daemon = NULL;
> > > +	return 0;
> > > +}
> > > +
> > > --- a/test/xlog/checkpoint_daemon.test.lua
> > > +++ b/test/xlog/checkpoint_daemon.test.lua
> > 
> > Please keep the test. I don't understand what's wrong with it,
> > neither it is obvious from changeset comments.
> 
> The test accesses checkpoint_daemon fiber directly to check the time of
> the next scheduled checkpoint. To keep the test we would have to export
> the time of the next scheduled checkpoint to box.info. Do we want it?

Checkpoint scheduling logic should be separated to a separate
independent module. We should write a unit test for this module.



More information about the Tarantool-patches mailing list