[Tarantool-patches] [PATCH v2 2/2] box: fix an assertion failure in box.ctl.promote()
Cyrill Gorcunov
gorcunov at gmail.com
Wed May 26 09:14:24 MSK 2021
On Tue, May 25, 2021 at 01:39:29PM +0300, Serge Petrenko wrote:
> box.ctl.promote() used to assume that the last synchronous entry is
> already written to WAL by the time it's called. This is not the case
> when promote is executed on the limbo owner. The last synchronous entry
> might still be en route to WAL.
Typo "en" -> "in".
> @@ -1618,14 +1618,29 @@ box_promote(void)
> txn_limbo.owner_id);
> return -1;
> }
> + if (txn_limbo_is_empty(&txn_limbo)) {
> + wait_lsn = txn_limbo.confirmed_lsn;
> + goto promote;
> + }
> }
>
> - /*
> - * promote() is a no-op on the limbo owner, so all the rows
> - * in the limbo must've come through the applier meaning they already
> - * have an lsn assigned, even if their WAL write hasn't finished yet.
> - */
> - wait_lsn = txn_limbo_last_synchro_entry(&txn_limbo)->lsn;
> + struct txn_limbo_entry *last_entry;
> + last_entry = txn_limbo_last_synchro_entry(&txn_limbo);
> + /* Wait for the last entries WAL write. */
> + if (last_entry->lsn < 0) {
> + if (wal_sync(NULL) < 0)
> + return -1;
> + if (txn_limbo_is_empty(&txn_limbo)) {
> + wait_lsn = txn_limbo.confirmed_lsn;
> + goto promote;
> + }
> + if (last_entry != txn_limbo_last_synchro_entry(&txn_limbo)) {
This is a bit dangerous. We cache a pointer and then go to fiber_yield,
which switches context, at this moment the pointer become dangling one
and we simply can't be sure if it _were_ reused. IOW, Serge are we
100% sure that the same pointer with same address but with new data
won't appear here as last entry in limbo?
> + diag_set(ClientError, ER_QUORUM_WAIT, quorum,
> + "new synchronous transactions appeared");
> + return -1;
> + }
> + }
> + wait_lsn = last_entry->lsn;
> assert(wait_lsn > 0);
More information about the Tarantool-patches
mailing list