Hi! I afraid the approach is not working. Please take a look into wal_write function inside `wal.c' file. There is a journal_entry_complete invocation which will call a txn_rollback function as an effect. So before your patch there was an invariant: if txn engine failed to send a transaction to a journal using txn_write_to_wal then this transaction should be rolled backed immediately (despite the fact was there an allocation error, cascading rollback or something else). Basing on the foregoing I would suggest you to switch to another invariant: let transaction to be existent if there was a writing error and to be rolled it back explicitly, but you should do corresponding journal changes then. WBR On Monday, February 17, 2020 6:59:53 PM MSK Cyrill Gorcunov wrote: > In case if we get failed in allocation of new > journal entry the txn_rollback will try to > derefernce nil pointer > > | txn_write > | > | fiber_set_txn(fiber(), NULL); // zap fiber's storage.txn > | txn_write_to_wal(txn); > | > | journal_entry_new(..., txn_entry_done_cb, ...) > | if (req == NULL) > | > | txn_rollback(txn); > | > | assert(txn == in_txn()); // in_txn()=nil, triggers > > This is because there are two call site: > > - when transaction is complete the wal engine will > call txn_entry_complete_cb completion handler and > since it is async in terms of threads (wal is a > separate thread) it setup the txn it processes > into a fiber's storage thus it expects the current > storage is nil > > - in turn error may happen and we need to run a rollback > procedure, there fiber's storage keeps current txn; > > Taking this into account we clean txn inside txn_write_to_wal > right before journal_write is called and restore it back > on error. This allows the caller code to run txn_rollback > in a more consistent way. > > https://github.com/tarantool/tarantool/issues/4776 > > Signed-off-by: Cyrill Gorcunov > --- > src/box/txn.c | 27 +++++++++++++++++++++------ > 1 file changed, 21 insertions(+), 6 deletions(-) > > diff --git a/src/box/txn.c b/src/box/txn.c > index a4ca48224..68365ea0b 100644 > --- a/src/box/txn.c > +++ b/src/box/txn.c > @@ -495,10 +495,8 @@ txn_write_to_wal(struct txn *txn) > &txn->region, > txn_entry_complete_cb, > txn); > - if (req == NULL) { > - txn_rollback(txn); > + if (req == NULL) > return -1; > - } > > struct txn_stmt *stmt; > struct xrow_header **remote_row = req->rows; > @@ -519,8 +517,20 @@ txn_write_to_wal(struct txn *txn) > assert(remote_row == req->rows + txn->n_applier_rows); > assert(local_row == remote_row + txn->n_new_rows); > > - /* Send the entry to the journal. */ > + /* > + * Queue the entry for processing in journal > + * engine. The semantics of complete_cb implies > + * that fiber's txn (kept in storage) is nil > + * becase WAL is a separet thread, for this > + * sake we zap it here. > + * > + * Still this is messy since the caller runs > + * txn_rollback if something bad happened. Thus > + * restore the former txn on error path. > + */ > + fiber_set_txn(fiber(), NULL); > if (journal_write(req) < 0) { > + fiber_set_txn(fiber(), txn); > diag_set(ClientError, ER_WAL_IO); > diag_log(); > return -1; > @@ -583,8 +593,13 @@ txn_write(struct txn *txn) > fiber_set_txn(fiber(), NULL); > return 0; > } > - fiber_set_txn(fiber(), NULL); > - return txn_write_to_wal(txn); > + > + if (txn_write_to_wal(txn) != 0) { > + txn_rollback(txn); > + return -1; > + } > + > + return 0; > } > > int