[tarantool-patches] Re: [PATCH 7/9] box: rewrite checkpoint daemon in C

Vladimir Davydov vdavydov.dev at gmail.com
Fri Nov 30 12:41:41 MSK 2018


On Fri, Nov 30, 2018 at 11:58:10AM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev at gmail.com> [18/11/28 19:16]:
> > Long time ago, when the checkpoint daemon was added to Tarantool, it was
> > responsible not only for making periodic checkpoints, but also for
> > maintaining the configured number of checkpoints and removing old snap
> > and xlog times, so it was much easier to implement it in Lua than in C.
> files 
> > However, over time, all its responsibilities have been reimplemented in
> > C and moved to the server code so that now it just calls box.snapshot()
> > periodically. Let's rewrite this simple procedure in C as well - this
> > will allow us to easily add more complex logic there, e.g. triggering
> > checkpoint when WAL files exceed a configured threshold.
> > ---
> >  src/box/CMakeLists.txt               |   1 -
> >  src/box/box.cc                       | 102 ++++++++++++++++++++++++
> >  src/box/box.h                        |   1 +
> >  src/box/lua/cfg.cc                   |  12 +++
> >  src/box/lua/checkpoint_daemon.lua    | 136 --------------------------------
> >  src/box/lua/init.c                   |   2 -
> >  src/box/lua/load_cfg.lua             |   2 +-
> >  test/xlog/checkpoint_daemon.result   | 145 -----------------------------------
> 
> Could we please move this fiber to gc.c at least? Let's not
> pollute box?

If we moved it to gc.c, there would be a cross-dependency between gc.c
and box.cc - the daemon needs box_checkpoint defined in box.cc while
box.cc needs gc methods to register consumers. One way to solve this
problem is move box_checkpoint to gc.c, however, it would look weird
IMO as checkpointing isn't really a part of garbage collection. I guess
we have to introduce a separate file for the checkpoint daemon and
box_checkpoint after all.

> 
> > +		if (box_checkpoint_is_in_progress) {
> > +			/*
> > +			 * The next checkpoint will be scheduled
> > +			 * by the concurrent box_checkpoint().
> > +			 */
> 
> This is rather fragile. Can we make the interaction between
> box-checkpoint and checkpoint daemon less obvious? Imagine in
> future we can perform checkpoints according to a cron. 
> 
> I think it would be better if the logic of next checkpoint time
> calculation is consolidated in the checkpoint daemon alone. If the
> daemon sees that a checkpoint is in progress it can skip the
> current checkpoint, but not delay the next checkpoint till
> infinity.

But then box_checkpoint triggered by exceeding checkpoint_wal_threshold
wouldn't reset the timer set by the checkpoint daemon and the time
interval between checkpoints could be less than configured at times,
which would probably raise questions. Would it be OK? Or we'd better
notify the checkpoint daemon about checkpoints done out of schedule (by
the user or on exceeding checkpoin_wal_threshold)?

> 
> > +			next_checkpoint_time = now + TIMEOUT_INFINITY;
> > +			continue;
> > +		}
> > +		box_checkpoint();
> > +	}
> > +	checkpoint_daemon = NULL;
> > +	return 0;
> > +}
> > +
> > --- a/test/xlog/checkpoint_daemon.test.lua
> > +++ b/test/xlog/checkpoint_daemon.test.lua
> 
> Please keep the test. I don't understand what's wrong with it,
> neither it is obvious from changeset comments.

The test accesses checkpoint_daemon fiber directly to check the time of
the next scheduled checkpoint. To keep the test we would have to export
the time of the next scheduled checkpoint to box.info. Do we want it?



More information about the Tarantool-patches mailing list