From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Serge Petrenko <sergepetrenko@tarantool.org>, gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH 3/3] box: fix an assertion failure in box.ctl.promote() Date: Sun, 23 May 2021 14:18:26 +0200 [thread overview] Message-ID: <1b0f0a3d-59ba-38ba-7f7e-f214664c8976@tarantool.org> (raw) In-Reply-To: <c1dba732dda390f60f1b1aa83ade6e73d0b89803.1621501007.git.sergepetrenko@tarantool.org> Hi! Thanks for the patch! I see some of CI jobs have failed the new test: https://github.com/tarantool/tarantool/runs/2620153809 See 4 comments below. > diff --git a/src/box/box.cc b/src/box/box.cc > index c10e0d8bf..1b1e7eec0 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -1442,17 +1442,22 @@ box_quorum_on_ack_f(struct trigger *trigger, void *event) > } > > /** > - * Wait until at least @a quorum of nodes confirm @a target_lsn from the node > - * with id @a lead_id. > + * Wait until at least @a quorum of nodes confirm the last available synchronous > + * entry from the node with id @a lead_id. > */ > static int > -box_wait_quorum(uint32_t lead_id, int64_t target_lsn, int quorum, > +box_wait_quorum(uint32_t lead_id, struct txn_limbo_entry **entry, int quorum, > double timeout) 1. Maybe try to leave this function not depending on the limbo and its entries? It was supposed to wait for replication of just LSN, not necessarily a synchronous transaction. > { > struct box_quorum_trigger t; > memset(&t, 0, sizeof(t)); > vclock_create(&t.vclock); > > + *entry = txn_limbo_wait_lsn_assigned(&txn_limbo); > + if (*entry == NULL) > + return -1; > + int64_t target_lsn = (*entry)->lsn; > + > /* Take this node into account immediately. */ > int ack_count = vclock_get(box_vclock, lead_id) >= target_lsn; > replicaset_foreach(replica) { > @@ -1622,22 +1627,17 @@ box_promote(void) > } > } > > - /* > - * promote() is a no-op on the limbo owner, so all the rows > - * in the limbo must've come through the applier meaning they already > - * have an lsn assigned, even if their WAL write hasn't finished yet. > - */ > - wait_lsn = txn_limbo_last_synchro_entry(&txn_limbo)->lsn; > - assert(wait_lsn > 0); > - > - rc = box_wait_quorum(former_leader_id, wait_lsn, quorum, > + struct txn_limbo_entry *last_entry; > + rc = box_wait_quorum(former_leader_id,&last_entry, quorum, 2. Missing whitespace after the first argument. > replication_synchro_timeout); > if (rc == 0) { > + wait_lsn = last_entry->lsn; > if (quorum < replication_synchro_quorum) { > diag_set(ClientError, ER_QUORUM_WAIT, quorum, > "quorum was increased while waiting"); > rc = -1; > - } else if (wait_lsn < txn_limbo_last_synchro_entry(&txn_limbo)->lsn) { > + } else if (last_entry != > + txn_limbo_last_synchro_entry(&txn_limbo)) { > diag_set(ClientError, ER_QUORUM_WAIT, quorum, > "new synchronous transactions appeared"); > rc = -1; 3. Could all the 3 commits be replaced with calling wal_sync() in the beginning of the promote() if we see the last LSN is unknown? After wal_sync() several outcomes are possible: - All was rolled back, and the limbo is empty; - The last transaction is different after sync - it means it was added during promote() which is an error like in the code above; - The transaction in the end of the limbo is the same. In the last case you work like before - box_wait_quorum() with the known LSN. Will it work? > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c > index f287369a2..406f2de89 100644 > --- a/src/box/txn_limbo.c > +++ b/src/box/txn_limbo.c > @@ -69,6 +69,48 @@ txn_limbo_last_synchro_entry(struct txn_limbo *limbo) > return NULL; > } > > +static int > +txn_limbo_wait_lsn_assigned_f(struct trigger *trig, void *event) > +{ > + (void)event; > + struct fiber *fiber = trig->data; > + fiber_wakeup(fiber); > + return 0; > +} > + > +struct txn_limbo_entry * > +txn_limbo_wait_lsn_assigned(struct txn_limbo *limbo) > +{ > + assert(!txn_limbo_is_empty(limbo)); > + struct txn_limbo_entry *entry = txn_limbo_last_synchro_entry(limbo); > + if (entry->lsn >= 0) > + return entry; > + > + struct trigger write_trigger, rollback_trigger; > + trigger_create(&write_trigger, txn_limbo_wait_lsn_assigned_f, fiber(), > + NULL); > + trigger_create(&rollback_trigger, txn_limbo_wait_lsn_assigned_f, > + fiber(), NULL); > + txn_on_wal_write(entry->txn, &write_trigger); > + txn_on_rollback(entry->txn, &rollback_trigger); > + do { > + fiber_yield(); > + if (fiber_is_cancelled()) { > + diag_set(FiberIsCancelled); > + entry = NULL; > + break; > + } > + if (entry->txn->signature < 0) { > + diag_set(ClientError, ER_SYNC_ROLLBACK); > + entry = NULL; > + break; > + } > + } while (entry->lsn == -1); > + trigger_clear(&write_trigger); > + trigger_clear(&rollback_trigger); 4. Why do you need the LSN assigned in the on_wal_write trigger in the previous commit? I can't see where do you use it here.
next prev parent reply other threads:[~2021-05-23 12:18 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-20 9:02 [Tarantool-patches] [PATCH 0/3] fix " Serge Petrenko via Tarantool-patches 2021-05-20 9:02 ` [Tarantool-patches] [PATCH 1/3] box: make txn reference the limbo entry Serge Petrenko via Tarantool-patches 2021-05-20 9:02 ` [Tarantool-patches] [PATCH 2/3] txn_limbo: move lsn assignment to journal completion callback Serge Petrenko via Tarantool-patches 2021-05-20 9:02 ` [Tarantool-patches] [PATCH 3/3] box: fix an assertion failure in box.ctl.promote() Serge Petrenko via Tarantool-patches 2021-05-23 12:18 ` Vladislav Shpilevoy via Tarantool-patches [this message] 2021-05-25 10:39 ` Serge Petrenko via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1b0f0a3d-59ba-38ba-7f7e-f214664c8976@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 3/3] box: fix an assertion failure in box.ctl.promote()' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox