[Tarantool-patches] [PATCH v2 2/2] box: fix an assertion failure in box.ctl.promote()
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Wed May 26 21:46:20 MSK 2021
>>> @@ -1618,14 +1618,29 @@ box_promote(void)
>>> txn_limbo.owner_id);
>>> return -1;
>>> }
>>> + if (txn_limbo_is_empty(&txn_limbo)) {
>>> + wait_lsn = txn_limbo.confirmed_lsn;
>>> + goto promote;
>>> + }
>>> }
>>> - /*
>>> - * promote() is a no-op on the limbo owner, so all the rows
>>> - * in the limbo must've come through the applier meaning they already
>>> - * have an lsn assigned, even if their WAL write hasn't finished yet.
>>> - */
>>> - wait_lsn = txn_limbo_last_synchro_entry(&txn_limbo)->lsn;
>>> + struct txn_limbo_entry *last_entry;
>>> + last_entry = txn_limbo_last_synchro_entry(&txn_limbo);
>>> + /* Wait for the last entries WAL write. */
>>> + if (last_entry->lsn < 0) {
>>> + if (wal_sync(NULL) < 0)
>>> + return -1;
>>> + if (txn_limbo_is_empty(&txn_limbo)) {
>>> + wait_lsn = txn_limbo.confirmed_lsn;
>>> + goto promote;
>>> + }
>>> + if (last_entry != txn_limbo_last_synchro_entry(&txn_limbo)) {
>> This is a bit dangerous. We cache a pointer and then go to fiber_yield,
>> which switches context, at this moment the pointer become dangling one
>> and we simply can't be sure if it _were_ reused. IOW, Serge are we
>> 100% sure that the same pointer with same address but with new data
>> won't appear here as last entry in limbo?
>
> I agree this solution is not perfect.
>
> An alternative would be to do the following:
> 1) Check that the limbo owner hasn't changed
> 2) Check that the last entry has positive lsn (e.g. it's not a new entry which
> wasn't yet written to WAL). And that this lsn is equal to the lsn of our entry.
>
> But what if our entry was confirmed and destroyed during wal_sync()? We can't compare
> other entries lsn with this ones.
As decided in the chat, you can use txn->id. It is unique until
restart and should help to detect if the last transaction has
changed.
More information about the Tarantool-patches
mailing list