[Tarantool-patches] [PATCH 3/3] box/applier: fix nil dereference in applier rollback

Cyrill Gorcunov gorcunov at gmail.com
Wed Feb 5 11:18:14 MSK 2020


On Wed, Feb 05, 2020 at 01:04:30AM +0300, Konstantin Osipov wrote:
> * Cyrill Gorcunov <gorcunov at gmail.com> [20/01/27 10:14]:
> > Currently when transaction rollback happens we just drop an existing
> > error setting ClientError to the replicaset.applier.diag. This action
> > leaves current fiber with diag=nil, which in turn leads to sigsegv once
> > diag_raise() called right after applier_apply_tx():
> > 
> >  | applier_f
> >  |   try {
> >  |   applier_subscribe
> >  |     applier_apply_tx
> >  |       // error happens
> >  |       txn_rollback
> >  |         diag_set(ClientError, ER_WAL_IO)
> >  |         diag_move(&fiber()->diag, &replicaset.applier.diag)
> >  |         // fiber->diag = nil
> 
> >  |       applier_on_rollback
> >  |         diag_add_error(&applier->diag, diag_last_error(&replicaset.applier.diag)
> >  |         fiber_cancel(applier->reader);
> >  |     diag_raise() -> NULL dereference
> >  |   } catch { ... }
> 
> Where exactly does the error happen in applier_apply_tx?

The reporter pointed somwhere into a deep dive into vynil, the
problem is that its been runnin release build first time it
triggered. Actually it doesn't matter where exactly it failed,
the only important thing is that it failed the way we need
to run a rollback procedure.

> 
> Looks like this:
> 
> >  |         diag_set(ClientError, ER_WAL_IO)
> >  |         diag_move(&fiber()->diag, &replicaset.applier.diag)
> 
> 
> overwrites the original error. 

True

> 
> Instead, the original error should be preserved and copied to the
> shared diagnostics area (replicaset.applier.error).

Sounds reasonable


More information about the Tarantool-patches mailing list