From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Wed, 5 Dec 2018 19:21:19 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] Re: [PATCH 7/9] box: rewrite checkpoint daemon in C Message-ID: <20181205162119.62o4nwoqdfr4afj6@esperanza> References: <3a95ae5e66ebfd72125eb53afb914efb5ab9cc0a.1543419109.git.vdavydov.dev@gmail.com> <20181130085810.GG5760@chai> <20181130094141.nukl64wxtxrduvdp@esperanza> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181130094141.nukl64wxtxrduvdp@esperanza> To: Konstantin Osipov Cc: tarantool-patches@freelists.org List-ID: Discussed verbally. For the record, see below. On Fri, Nov 30, 2018 at 12:41:41PM +0300, Vladimir Davydov wrote: > On Fri, Nov 30, 2018 at 11:58:10AM +0300, Konstantin Osipov wrote: > > * Vladimir Davydov [18/11/28 19:16]: > > > Long time ago, when the checkpoint daemon was added to Tarantool, it was > > > responsible not only for making periodic checkpoints, but also for > > > maintaining the configured number of checkpoints and removing old snap > > > and xlog times, so it was much easier to implement it in Lua than in C. > > files > > > However, over time, all its responsibilities have been reimplemented in > > > C and moved to the server code so that now it just calls box.snapshot() > > > periodically. Let's rewrite this simple procedure in C as well - this > > > will allow us to easily add more complex logic there, e.g. triggering > > > checkpoint when WAL files exceed a configured threshold. > > > --- > > > src/box/CMakeLists.txt | 1 - > > > src/box/box.cc | 102 ++++++++++++++++++++++++ > > > src/box/box.h | 1 + > > > src/box/lua/cfg.cc | 12 +++ > > > src/box/lua/checkpoint_daemon.lua | 136 -------------------------------- > > > src/box/lua/init.c | 2 - > > > src/box/lua/load_cfg.lua | 2 +- > > > test/xlog/checkpoint_daemon.result | 145 ----------------------------------- > > > > Could we please move this fiber to gc.c at least? Let's not > > pollute box? > > If we moved it to gc.c, there would be a cross-dependency between gc.c > and box.cc - the daemon needs box_checkpoint defined in box.cc while > box.cc needs gc methods to register consumers. One way to solve this > problem is move box_checkpoint to gc.c, however, it would look weird > IMO as checkpointing isn't really a part of garbage collection. I guess > we have to introduce a separate file for the checkpoint daemon and > box_checkpoint after all. Agreed to move checkpoint daemon to gc.c. In order not to introduce cross-reference between box.cc and gc.c, gc_make_checkpoint() will be introduced. The new method will do the work currently done by box_checkpoint(), namely call engine and WAL checkpoint callbacks. > > > > > > + if (box_checkpoint_is_in_progress) { > > > + /* > > > + * The next checkpoint will be scheduled > > > + * by the concurrent box_checkpoint(). > > > + */ > > > > This is rather fragile. Can we make the interaction between > > box-checkpoint and checkpoint daemon less obvious? Imagine in > > future we can perform checkpoints according to a cron. > > > > I think it would be better if the logic of next checkpoint time > > calculation is consolidated in the checkpoint daemon alone. If the > > daemon sees that a checkpoint is in progress it can skip the > > current checkpoint, but not delay the next checkpoint till > > infinity. > > But then box_checkpoint triggered by exceeding checkpoint_wal_threshold > wouldn't reset the timer set by the checkpoint daemon and the time > interval between checkpoints could be less than configured at times, > which would probably raise questions. Would it be OK? Or we'd better > notify the checkpoint daemon about checkpoints done out of schedule (by > the user or on exceeding checkpoin_wal_threshold)? We must notify the checkpoint daemon to reschedule the next checkpoint whenever the user makes a checkpoint with box.snapshot(). That is, there should be gc_schedule_checkpoint() helper that will schedule the next checkpoint. This method will be used both by the checkpoint daemon and by box.snapshot() in order to schedule the next checkpoint according to box.cfg.checkpoint_interval. > > > > > > + next_checkpoint_time = now + TIMEOUT_INFINITY; > > > + continue; > > > + } > > > + box_checkpoint(); > > > + } > > > + checkpoint_daemon = NULL; > > > + return 0; > > > +} > > > + > > > --- a/test/xlog/checkpoint_daemon.test.lua > > > +++ b/test/xlog/checkpoint_daemon.test.lua > > > > Please keep the test. I don't understand what's wrong with it, > > neither it is obvious from changeset comments. > > The test accesses checkpoint_daemon fiber directly to check the time of > the next scheduled checkpoint. To keep the test we would have to export > the time of the next scheduled checkpoint to box.info. Do we want it? Checkpoint scheduling logic should be separated to a separate independent module. We should write a unit test for this module.