From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 18 Apr 2019 20:15:51 +0300 From: Vladimir Davydov Subject: Re: [PATCH] memtx: cancel checkpoint thread at exit Message-ID: <20190418171551.5zfqurxnyrdkwxjj@esperanza> References: <8278bbd0a73bbd0e18a4e8633eb0ad71f2ffa3f7.1555601315.git.vdavydov.dev@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8278bbd0a73bbd0e18a4e8633eb0ad71f2ffa3f7.1555601315.git.vdavydov.dev@gmail.com> To: tarantool-patches@freelists.org List-ID: On Thu, Apr 18, 2019 at 06:46:54PM +0300, Vladimir Davydov wrote: > +static void > +checkpoint_cancel(struct checkpoint *ckpt) > +{ > + /* > + * Cancel the checkpoint thread if it's running and wait > + * for it to terminate so as to eliminate the possibility > + * of use-after-free. > + */ > + if (ckpt->cord.id != 0 && > + tt_pthread_cancel(ckpt->cord.id) != ESRCH) > + tt_pthread_join(ckpt->cord.id, NULL); > + checkpoint_delete(ckpt); > +} Oops, turned out this check isn't quite correct as we don't reset cord.id after joining the thread. As a result, we might call pthread_cancel on an already joined thread, which leads to a crash (caught by Travis CI): | #0 0x557f6332e9a5 in print_backtrace+1b | #1 0x557f6310614f in _ZL12sig_fatal_cbiP9siginfo_tPv+451 | #2 0x7f98504e10e0 in __restore_rt+0 | #3 0x7f98504de499 in pthread_cancel+9 | #4 0x557f6314aa48 in checkpoint_cancel+52 | #5 0x557f6314866c in memtx_engine_shutdown+55 | #6 0x557f63144fb1 in engine_shutdown+6f | #7 0x557f6326bb52 in box_free+98 | #8 0x557f631083cf in _Z14tarantool_freev+a9 | #9 0x557f63109ae2 in main+10cf | #10 0x7f984f24e2e1 in __libc_start_main+f1 | #11 0x557f63104f6a in _start+2a | #12 (nil) in +2a Actually, we don't need to check cord.id at all - we can use already existing waiting_for_snap_thread flag instead. Fixed this as follows (this is an incremental diff; for the full diff see below): diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c index 2b9bef24..4d99910c 100644 --- a/src/box/memtx_engine.c +++ b/src/box/memtx_engine.c @@ -595,11 +595,6 @@ checkpoint_new(const char *snap_dirname, uint64_t snap_io_rate_limit) xdir_create(&ckpt->dir, snap_dirname, SNAP, &INSTANCE_UUID, &opts); vclock_create(&ckpt->vclock); ckpt->touch = false; - /* - * Reset the checkpoint thread id so that checkpoint_cancel() - * doesn't attempt to cancel the thread if it isn't running. - */ - ckpt->cord.id = 0; return ckpt; } @@ -623,9 +618,10 @@ checkpoint_cancel(struct checkpoint *ckpt) * for it to terminate so as to eliminate the possibility * of use-after-free. */ - if (ckpt->cord.id != 0 && - tt_pthread_cancel(ckpt->cord.id) != ESRCH) + if (ckpt->waiting_for_snap_thread) { + tt_pthread_cancel(ckpt->cord.id); tt_pthread_join(ckpt->cord.id, NULL); + } checkpoint_delete(ckpt); } >From 66ff448a89864c3de7d34563a515295d62f7f944 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov Date: Thu, 18 Apr 2019 18:03:26 +0300 Subject: [PATCH] memtx: cancel checkpoint thread at exit If a tarantool instance exits while checkpointing is in progress, the memtx checkpoint thread, which writes the snap file, can access already freed data resulting in a crash. Let's fix this the same way we did for relay and vinyl threads - simply cancel the thread forcefully and wait for it to terminate. Closes #4170 diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c index 48f700a0..4d99910c 100644 --- a/src/box/memtx_engine.c +++ b/src/box/memtx_engine.c @@ -52,6 +52,9 @@ /* sync snapshot every 16MB */ #define SNAP_SYNC_INTERVAL (1 << 24) +static void +checkpoint_cancel(struct checkpoint *ckpt); + /* * Memtx yield-in-transaction trigger: roll back the effects * of the transaction and mark the transaction as aborted. @@ -179,6 +182,8 @@ static void memtx_engine_shutdown(struct engine *engine) { struct memtx_engine *memtx = (struct memtx_engine *)engine; + if (memtx->checkpoint != NULL) + checkpoint_cancel(memtx->checkpoint); mempool_destroy(&memtx->iterator_pool); if (mempool_is_initialized(&memtx->rtree_iterator_pool)) mempool_destroy(&memtx->rtree_iterator_pool); @@ -605,6 +610,20 @@ checkpoint_delete(struct checkpoint *ckpt) free(ckpt); } +static void +checkpoint_cancel(struct checkpoint *ckpt) +{ + /* + * Cancel the checkpoint thread if it's running and wait + * for it to terminate so as to eliminate the possibility + * of use-after-free. + */ + if (ckpt->waiting_for_snap_thread) { + tt_pthread_cancel(ckpt->cord.id); + tt_pthread_join(ckpt->cord.id, NULL); + } + checkpoint_delete(ckpt); +} static int checkpoint_add_space(struct space *sp, void *data)