From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 11BD54696C4 for ; Sat, 4 Apr 2020 19:16:51 +0300 (MSK) Received: by mail-lj1-f175.google.com with SMTP id k21so10179420ljh.2 for ; Sat, 04 Apr 2020 09:16:51 -0700 (PDT) From: Cyrill Gorcunov Date: Sat, 4 Apr 2020 19:15:23 +0300 Message-Id: <20200404161524.7466-8-gorcunov@gmail.com> In-Reply-To: <20200404161524.7466-1-gorcunov@gmail.com> References: <20200404161524.7466-1-gorcunov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH v11 7/8] box/applier: prevent nil dereference on applier rollback List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tml Currently when transaction rollback happens we just drop an existing error setting ClientError to the replicaset.applier.diag. This action leaves current fiber with diag=nil, which in turn leads to sigsegv once diag_raise() called right after applier_apply_tx(): | applier_f | try { | applier_subscribe | applier_apply_tx | // error happens | txn_rollback | diag_set(ClientError, ER_WAL_IO) | diag_move(&fiber()->diag, &replicaset.applier.diag) | // fiber->diag = nil | applier_on_rollback | diag_add_error(&applier->diag, diag_last_error(&replicaset.applier.diag) | fiber_cancel(applier->reader); | diag_raise() -> NULL dereference | } catch { ... } Thus: - use diag_set_error() instead of diag_move() to not drop error from a current fiber() preventing a nil dereference; - put fixme mark into the code: we need to rework it in a more sense way. Fixes #4730 Signed-off-by: Cyrill Gorcunov --- src/box/applier.cc | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/src/box/applier.cc b/src/box/applier.cc index 2f9c9c797..68de3c08c 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -692,9 +692,22 @@ static int applier_txn_rollback_cb(struct trigger *trigger, void *event) { (void) trigger; - /* Setup shared applier diagnostic area. */ + + /* + * Setup shared applier diagnostic area. + * + * FIXME: We should consider redesign this + * moment and instead of carrying one shared + * diag use per-applier diag instead all the time + * (which actually already present in the structure). + * + * But remember that transactions are asynchronous + * and rollback may happen a way latter after it + * passed to the journal engine. + */ diag_set(ClientError, ER_WAL_IO); - diag_move(&fiber()->diag, &replicaset.applier.diag); + diag_set_error(&replicaset.applier.diag, + diag_last_error(diag_get())); /* Broadcast the rollback event across all appliers. */ trigger_run(&replicaset.applier.on_rollback, event); -- 2.20.1