From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp15.mail.ru (smtp15.mail.ru [94.100.176.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 45AA74696C3 for ; Tue, 7 Apr 2020 13:36:46 +0300 (MSK) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) From: Serge Petrenko In-Reply-To: <20200404161524.7466-8-gorcunov@gmail.com> Date: Tue, 7 Apr 2020 13:36:45 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <68D09775-9F5A-4E4D-ABC0-4A811C4AF43F@tarantool.org> References: <20200404161524.7466-1-gorcunov@gmail.com> <20200404161524.7466-8-gorcunov@gmail.com> Subject: Re: [Tarantool-patches] [PATCH v11 7/8] box/applier: prevent nil dereference on applier rollback List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Cyrill Gorcunov Cc: tml Hi! Thanks for the patch. > 4 =D0=B0=D0=BF=D1=80. 2020 =D0=B3., =D0=B2 19:15, Cyrill Gorcunov = =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0): >=20 > Currently when transaction rollback happens we just drop an existing > error setting ClientError to the replicaset.applier.diag. This action > leaves current fiber with diag=3Dnil, which in turn leads to sigsegv = once > diag_raise() called right after applier_apply_tx(): >=20 > | applier_f > | try { > | applier_subscribe > | applier_apply_tx > | // error happens > | txn_rollback > | diag_set(ClientError, ER_WAL_IO) > | diag_move(&fiber()->diag, &replicaset.applier.diag) > | // fiber->diag =3D nil > | applier_on_rollback > | diag_add_error(&applier->diag, = diag_last_error(&replicaset.applier.diag) > | fiber_cancel(applier->reader); > | diag_raise() -> NULL dereference > | } catch { ... } >=20 > Thus: > - use diag_set_error() instead of diag_move() to not drop error > from a current fiber() preventing a nil dereference; > - put fixme mark into the code: we need to rework it in a > more sense way. >=20 > Fixes #4730 >=20 > Signed-off-by: Cyrill Gorcunov > --- > src/box/applier.cc | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) >=20 > diff --git a/src/box/applier.cc b/src/box/applier.cc > index 2f9c9c797..68de3c08c 100644 > --- a/src/box/applier.cc > +++ b/src/box/applier.cc > @@ -692,9 +692,22 @@ static int > applier_txn_rollback_cb(struct trigger *trigger, void *event) > { > (void) trigger; > - /* Setup shared applier diagnostic area. */ > + > + /* > + * Setup shared applier diagnostic area. > + * > + * FIXME: We should consider redesign this > + * moment and instead of carrying one shared > + * diag use per-applier diag instead all the time > + * (which actually already present in the structure). > + * > + * But remember that transactions are asynchronous > + * and rollback may happen a way latter after it > + * passed to the journal engine. > + */ > diag_set(ClientError, ER_WAL_IO); > - diag_move(&fiber()->diag, &replicaset.applier.diag); > + diag_set_error(&replicaset.applier.diag, > + diag_last_error(diag_get())); >=20 > /* Broadcast the rollback event across all appliers. */ > trigger_run(&replicaset.applier.on_rollback, event); > =E2=80=94=20 > 2.20.1 >=20 LGTM. -- Serge Petrenko sergepetrenko@tarantool.org