From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp57.i.mail.ru (smtp57.i.mail.ru [217.69.128.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 450B043D678 for ; Wed, 23 Oct 2019 23:49:48 +0300 (MSK) References: <649183f1-0852-734d-fe75-5d6ed1e956eb@tarantool.org> <20191023125903.30261-1-i.kosarev@tarantool.org> From: Vladislav Shpilevoy Message-ID: <6942253b-137d-1ea3-092c-da3d979465cd@tarantool.org> Date: Wed, 23 Oct 2019 22:55:05 +0200 MIME-Version: 1.0 In-Reply-To: <20191023125903.30261-1-i.kosarev@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH] replication: cancel replica joining thread at exit List-Id: Tarantool development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ilya Kosarev , tarantool-patches@dev.tarantool.org LGTM. On 23/10/2019 14:59, Ilya Kosarev wrote: > If a tarantool instance exits while joining replica is in progress, > the replica joining thread can access already freed data resulting > in a crash. Let's fix this the same way we did for checkpoint thread > - simply cancel the thread forcefully and wait for it to terminate. > > Closes #4528 > --- > https://github.com/tarantool/tarantool/tree/i.kosarev/gh-4528-fix-shutdown-on-replica-join > https://github.com/tarantool/tarantool/issues/4528 > > src/box/memtx_engine.c | 25 ++++++++++++++++++++++++- > src/box/memtx_engine.h | 5 +++++ > 2 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c > index ecce3b1b6..23ccc4703 100644 > --- a/src/box/memtx_engine.c > +++ b/src/box/memtx_engine.c > @@ -55,6 +55,9 @@ > static void > checkpoint_cancel(struct checkpoint *ckpt); > > +static void > +replica_join_cancel(struct cord *replica_join_cord); > + > struct PACKED memtx_tuple { > /* > * sic: the header of the tuple is used > @@ -129,6 +132,8 @@ memtx_engine_shutdown(struct engine *engine) > struct memtx_engine *memtx = (struct memtx_engine *)engine; > if (memtx->checkpoint != NULL) > checkpoint_cancel(memtx->checkpoint); > + if (memtx->replica_join_cord != NULL) > + replica_join_cancel(memtx->replica_join_cord); > mempool_destroy(&memtx->iterator_pool); > if (mempool_is_initialized(&memtx->rtree_iterator_pool)) > mempool_destroy(&memtx->rtree_iterator_pool); > @@ -527,6 +532,18 @@ checkpoint_cancel(struct checkpoint *ckpt) > checkpoint_delete(ckpt); > } > > +static void > +replica_join_cancel(struct cord *replica_join_cord) > +{ > + /* > + * Cancel the thread being used to join replica if it's > + * running and wait for it to terminate so as to > + * eliminate the possibility of use-after-free. > + */ > + tt_pthread_cancel(replica_join_cord->id); > + tt_pthread_join(replica_join_cord->id, NULL); > +} > + > static int > checkpoint_add_space(struct space *sp, void *data) > { > @@ -848,7 +865,11 @@ memtx_engine_join(struct engine *engine, void *arg, struct xstream *stream) > struct cord cord; > if (cord_costart(&cord, "initial_join", memtx_join_f, ctx) != 0) > return -1; > - return cord_cojoin(&cord); > + struct memtx_engine *memtx = (struct memtx_engine *)engine; > + memtx->replica_join_cord = &cord; > + int res = cord_cojoin(&cord); > + memtx->replica_join_cord = NULL; > + return res; > } > > static void > @@ -1030,6 +1051,8 @@ memtx_engine_new(const char *snap_dirname, bool force_recovery, > memtx->max_tuple_size = MAX_TUPLE_SIZE; > memtx->force_recovery = force_recovery; > > + memtx->replica_join_cord = NULL; > + > memtx->base.vtab = &memtx_engine_vtab; > memtx->base.name = "memtx"; > > diff --git a/src/box/memtx_engine.h b/src/box/memtx_engine.h > index c092f5d8e..f562c66df 100644 > --- a/src/box/memtx_engine.h > +++ b/src/box/memtx_engine.h > @@ -107,6 +107,11 @@ struct memtx_engine { > uint64_t snap_io_rate_limit; > /** Skip invalid snapshot records if this flag is set. */ > bool force_recovery; > + /** > + * Cord being currently used to join replica. It is only > + * needed to be able to cancel it on shutdown. > + */ > + struct cord *replica_join_cord; > /** Common quota for tuples and indexes. */ > struct quota quota; > /** >