From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id F1213469719 for ; Mon, 17 Feb 2020 22:05:11 +0300 (MSK) Received: by mail-lj1-f196.google.com with SMTP id e18so19999434ljn.12 for ; Mon, 17 Feb 2020 11:05:11 -0800 (PST) Date: Mon, 17 Feb 2020 22:05:08 +0300 From: Cyrill Gorcunov Message-ID: <20200217190508.GA2482@uranus> References: <20200217155953.25803-1-gorcunov@gmail.com> <20200217155953.25803-5-gorcunov@gmail.com> <20200217171940.GD11553@atlas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200217171940.GD11553@atlas> Subject: Re: [Tarantool-patches] [PATCH 4/4] box/txn: fix nil dereference in txn_rollback List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Konstantin Osipov Cc: tml On Mon, Feb 17, 2020 at 08:19:40PM +0300, Konstantin Osipov wrote: > > (remove jounral_entry_complete() from here). > > I think once you remove journal_entry_complete (which basically > calls rollback), you will be able to remove this ugly > "cleanr-then-restore" of txn as well: IOW, like below? Grigory pointed about my previous attempt. Here is my call-gparh view (please be ready that i'm wrong) 1) Good path without error (with the patch below applied) txn_write txn_write_to_wal journal_write fiber_set_txn(fiber(), NULL); now it is nill and txn_entry_complete_cb process just fine txn_entry_complete_cb assert(in_txn() == NULL); fiber_set_txn(fiber(), txn); txn_complete(txn); fiber_set_txn(fiber(), NULL); 2) Error paths a) error during journal_entry_new allocation txn_write txn_write_to_wal req = journal_entry_new = NULL; return -1; txn_rollback (with fiber.txn != NULL) return -1 fiber.txn is not null, so txn_rollback reverts and set it to null, looks fine b) error during journal_write txn_write txn_write_to_wal journal_write wal_write ... entry->res = -1; return -1; txn_rollback(with fiber.txn != NULL), which is fine. Guys, do I miss something obvious? The key moment is dropping journal_entry_complete call from wal.c on error path and defer setting fiber.txn = NULL until journal_write passed without errors. --- Subject: [PATCH] prelim Signed-off-by: Cyrill Gorcunov --- src/box/txn.c | 15 ++++++++++----- src/box/wal.c | 1 - 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/src/box/txn.c b/src/box/txn.c index a4ca48224..a75f8dc0b 100644 --- a/src/box/txn.c +++ b/src/box/txn.c @@ -495,10 +495,8 @@ txn_write_to_wal(struct txn *txn) &txn->region, txn_entry_complete_cb, txn); - if (req == NULL) { - txn_rollback(txn); + if (req == NULL) return -1; - } struct txn_stmt *stmt; struct xrow_header **remote_row = req->rows; @@ -525,6 +523,8 @@ txn_write_to_wal(struct txn *txn) diag_log(); return -1; } + + fiber_set_txn(fiber(), NULL); return 0; } @@ -583,8 +583,13 @@ txn_write(struct txn *txn) fiber_set_txn(fiber(), NULL); return 0; } - fiber_set_txn(fiber(), NULL); - return txn_write_to_wal(txn); + + if (txn_write_to_wal(txn) < 0) { + txn_rollback(txn); + return -1; + } + + return 0; } int diff --git a/src/box/wal.c b/src/box/wal.c index 0ae66ff32..a8bab4f34 100644 --- a/src/box/wal.c +++ b/src/box/wal.c @@ -1209,7 +1209,6 @@ wal_write(struct journal *journal, struct journal_entry *entry) fail: entry->res = -1; - journal_entry_complete(entry); return -1; } -- 2.20.1