From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp56.i.mail.ru (smtp56.i.mail.ru [217.69.128.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 0B479469719 for ; Mon, 17 Feb 2020 20:25:04 +0300 (MSK) From: Georgy Kirichenko Date: Mon, 17 Feb 2020 20:25:03 +0300 Message-ID: <2154704.ElGaqSPkdT@localhost> In-Reply-To: <20200217155953.25803-5-gorcunov@gmail.com> References: <20200217155953.25803-1-gorcunov@gmail.com> <20200217155953.25803-5-gorcunov@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart4521036.GXAFRqVoOG"; micalg="pgp-sha256"; protocol="application/pgp-signature" Subject: Re: [Tarantool-patches] [PATCH 4/4] box/txn: fix nil dereference in txn_rollback List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tml --nextPart4521036.GXAFRqVoOG Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Hi! I afraid the approach is not working. Please take a look into wal_write function inside `wal.c' file. There is a journal_entry_complete invocation which will call a txn_rollback function as an effect. So before your patch there was an invariant: if txn engine failed to send a transaction to a journal using txn_write_to_wal then this transaction should be rolled backed immediately (despite the fact was there an allocation error, cascading rollback or something else). Basing on the foregoing I would suggest you to switch to another invariant: let transaction to be existent if there was a writing error and to be rolled it back explicitly, but you should do corresponding journal changes then. WBR On Monday, February 17, 2020 6:59:53 PM MSK Cyrill Gorcunov wrote: > In case if we get failed in allocation of new > journal entry the txn_rollback will try to > derefernce nil pointer > > | txn_write > | > | fiber_set_txn(fiber(), NULL); // zap fiber's storage.txn > | txn_write_to_wal(txn); > | > | journal_entry_new(..., txn_entry_done_cb, ...) > | if (req == NULL) > | > | txn_rollback(txn); > | > | assert(txn == in_txn()); // in_txn()=nil, triggers > > This is because there are two call site: > > - when transaction is complete the wal engine will > call txn_entry_complete_cb completion handler and > since it is async in terms of threads (wal is a > separate thread) it setup the txn it processes > into a fiber's storage thus it expects the current > storage is nil > > - in turn error may happen and we need to run a rollback > procedure, there fiber's storage keeps current txn; > > Taking this into account we clean txn inside txn_write_to_wal > right before journal_write is called and restore it back > on error. This allows the caller code to run txn_rollback > in a more consistent way. > > https://github.com/tarantool/tarantool/issues/4776 > > Signed-off-by: Cyrill Gorcunov > --- > src/box/txn.c | 27 +++++++++++++++++++++------ > 1 file changed, 21 insertions(+), 6 deletions(-) > > diff --git a/src/box/txn.c b/src/box/txn.c > index a4ca48224..68365ea0b 100644 > --- a/src/box/txn.c > +++ b/src/box/txn.c > @@ -495,10 +495,8 @@ txn_write_to_wal(struct txn *txn) > &txn->region, > txn_entry_complete_cb, > txn); > - if (req == NULL) { > - txn_rollback(txn); > + if (req == NULL) > return -1; > - } > > struct txn_stmt *stmt; > struct xrow_header **remote_row = req->rows; > @@ -519,8 +517,20 @@ txn_write_to_wal(struct txn *txn) > assert(remote_row == req->rows + txn->n_applier_rows); > assert(local_row == remote_row + txn->n_new_rows); > > - /* Send the entry to the journal. */ > + /* > + * Queue the entry for processing in journal > + * engine. The semantics of complete_cb implies > + * that fiber's txn (kept in storage) is nil > + * becase WAL is a separet thread, for this > + * sake we zap it here. > + * > + * Still this is messy since the caller runs > + * txn_rollback if something bad happened. Thus > + * restore the former txn on error path. > + */ > + fiber_set_txn(fiber(), NULL); > if (journal_write(req) < 0) { > + fiber_set_txn(fiber(), txn); > diag_set(ClientError, ER_WAL_IO); > diag_log(); > return -1; > @@ -583,8 +593,13 @@ txn_write(struct txn *txn) > fiber_set_txn(fiber(), NULL); > return 0; > } > - fiber_set_txn(fiber(), NULL); > - return txn_write_to_wal(txn); > + > + if (txn_write_to_wal(txn) != 0) { > + txn_rollback(txn); > + return -1; > + } > + > + return 0; > } > > int --nextPart4521036.GXAFRqVoOG Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEni2K8y45nFEfidm2g0fcwO+Ma8AFAl5KzG8ACgkQg0fcwO+M a8AyQwf+LhaEbtfltWUEb5t1URyri6VoYY47CIWs5Gt2IKISmzz0FmoM2qj5SNnL DOiBpFUCRsuEDZtHtpkMyJniHN2QGm2FaGL+pXzojCM/lZcHMAJqvkgX0sI+lLBW FHpLUjCzBZmFbzYCqwe7Pi4tQerNABTqjQkSWztNQ7lSf0lapFBLHISiH1oEh67I Q+enZBGjGbyr17RG/7e6AH+Lnkq4yh/NP851DRIIRK+6XEqaRdDqbVlA/5QvacML r/SYaTVYV1RNLPz+FATQuVWxaJmU4eqfc1dAwK4Leb6tOMeRATk/VPktq4pOAGCm xzC00Ypxn0jqyzlX9m5OJj/SDTnrMw== =L71b -----END PGP SIGNATURE----- --nextPart4521036.GXAFRqVoOG--