From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 30 Nov 2018 12:41:41 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] Re: [PATCH 7/9] box: rewrite checkpoint daemon in C Message-ID: <20181130094141.nukl64wxtxrduvdp@esperanza> References: <3a95ae5e66ebfd72125eb53afb914efb5ab9cc0a.1543419109.git.vdavydov.dev@gmail.com> <20181130085810.GG5760@chai> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181130085810.GG5760@chai> To: Konstantin Osipov Cc: tarantool-patches@freelists.org List-ID: On Fri, Nov 30, 2018 at 11:58:10AM +0300, Konstantin Osipov wrote: > * Vladimir Davydov [18/11/28 19:16]: > > Long time ago, when the checkpoint daemon was added to Tarantool, it was > > responsible not only for making periodic checkpoints, but also for > > maintaining the configured number of checkpoints and removing old snap > > and xlog times, so it was much easier to implement it in Lua than in C. > files > > However, over time, all its responsibilities have been reimplemented in > > C and moved to the server code so that now it just calls box.snapshot() > > periodically. Let's rewrite this simple procedure in C as well - this > > will allow us to easily add more complex logic there, e.g. triggering > > checkpoint when WAL files exceed a configured threshold. > > --- > > src/box/CMakeLists.txt | 1 - > > src/box/box.cc | 102 ++++++++++++++++++++++++ > > src/box/box.h | 1 + > > src/box/lua/cfg.cc | 12 +++ > > src/box/lua/checkpoint_daemon.lua | 136 -------------------------------- > > src/box/lua/init.c | 2 - > > src/box/lua/load_cfg.lua | 2 +- > > test/xlog/checkpoint_daemon.result | 145 ----------------------------------- > > Could we please move this fiber to gc.c at least? Let's not > pollute box? If we moved it to gc.c, there would be a cross-dependency between gc.c and box.cc - the daemon needs box_checkpoint defined in box.cc while box.cc needs gc methods to register consumers. One way to solve this problem is move box_checkpoint to gc.c, however, it would look weird IMO as checkpointing isn't really a part of garbage collection. I guess we have to introduce a separate file for the checkpoint daemon and box_checkpoint after all. > > > + if (box_checkpoint_is_in_progress) { > > + /* > > + * The next checkpoint will be scheduled > > + * by the concurrent box_checkpoint(). > > + */ > > This is rather fragile. Can we make the interaction between > box-checkpoint and checkpoint daemon less obvious? Imagine in > future we can perform checkpoints according to a cron. > > I think it would be better if the logic of next checkpoint time > calculation is consolidated in the checkpoint daemon alone. If the > daemon sees that a checkpoint is in progress it can skip the > current checkpoint, but not delay the next checkpoint till > infinity. But then box_checkpoint triggered by exceeding checkpoint_wal_threshold wouldn't reset the timer set by the checkpoint daemon and the time interval between checkpoints could be less than configured at times, which would probably raise questions. Would it be OK? Or we'd better notify the checkpoint daemon about checkpoints done out of schedule (by the user or on exceeding checkpoin_wal_threshold)? > > > + next_checkpoint_time = now + TIMEOUT_INFINITY; > > + continue; > > + } > > + box_checkpoint(); > > + } > > + checkpoint_daemon = NULL; > > + return 0; > > +} > > + > > --- a/test/xlog/checkpoint_daemon.test.lua > > +++ b/test/xlog/checkpoint_daemon.test.lua > > Please keep the test. I don't understand what's wrong with it, > neither it is obvious from changeset comments. The test accesses checkpoint_daemon fiber directly to check the time of the next scheduled checkpoint. To keep the test we would have to export the time of the next scheduled checkpoint to box.info. Do we want it?