From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: Sergey Ostanevich <sergos@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH 1/1] wal: simplify rollback Date: Wed, 6 May 2020 23:43:54 +0200 [thread overview] Message-ID: <9a29772d-0ade-5735-adc1-536d4e76a1f1@tarantool.org> (raw) In-Reply-To: <20200506153843.GG112@tarantool.org> Hi! Thanks for the review! > On 01 мая 00:50, Vladislav Shpilevoy wrote: >> From: Georgy Kirichenko <georgy@tarantool.org> >> >> Here is a summary on how and when rollback works in WAL. >> >> Rollback happens, when disk write fails. In that case the failed > ^^^ > Disk write failure can cause rollback. Is it better? To me both are equal, so whatever. Applied your version. >> and all next transactions, sent to WAL, should be rolled back. >> Together. Following transactions should be rolled back too, >> because they could make their statements based on what they saw in >> the failed transaction. Also rollback of the failed transaction >> without rollback of the next ones can actually rewrite what they >> committed. >> >> So when rollback is started, *all* pending transactions should be >> rolled back. However if they would keep coming, the rollback would >> be infinite. > > Not quite - you start rolling of txn4...txn1 (in reverse order) and at > some moment the txn5 appears. It will just ruin the consistency of the > data, just as you mentioned before - read of a yet-to-be rolled back, > writing of a will-be-affected by next roll back. Well, it will not. txn5 appearance means it is after txn4. And it will be rolled back before txn4, in case it tries to commit before the whole rollback procedure ends. So the reversed order is fine. In case you worry txn5 can appear right during rollback (between rolling back txn4 and txn3, for example) - it is not possible. Rollback of the whole batch does not yield. So while TX thread waits all rolled back transactions from WAL thread, it is legal to rollback all newer transactions immediately. When all rolled back transactions finally return back to TX thread, they are uncommitted without yields. Therefore my statement still seems to be correct. We rollback all transactions, in reversed order, and if rollback is started, new transactions are rolled back immediately without even trying to go to WAL, until all already sent transactions are rolled back. >> This means to complete a rollback it is necessary to >> stop sending new transactions to WAL, then rollback all already >> sent. In the end allow new transactions again. >> >> Step-by-step: >> >> 1) stop accepting all new transactions in WAL thread, where >> rollback is started. All new transactions don't even try to go to >> disk. They added to rollback queue immediately after arriving to >> WAL thread. >> >> 2) tell TX thread to stop sending new transactions to WAL. So as >> the rollback queue would stop growing. >> >> 3) rollback all transactions in reverse order. >> >> 4) allow transactions again in WAL thread and TX thread. >> >> The algorithm is long, but simple and understandable. However >> implementation wasn't so easy. It was done using a 4-hop cbus >> route. 2 hops of which were supposed to clear cbus channel from >> all other cbus messages. Next two hops implemented steps 3 and 4. >> Rollback state of the WAL was signaled by checking internals of a >> preallocated cbus message. >> >> The patch makes it simpler and more straightforward. Rollback >> state is now signaled by a simple flag, and there is no a hack >> about clearing cbus channel, no touching attributes of a cbus >> message. The moment when all transactions are stopped and the last >> one has returned from WAL is visible explicitly, because the last >> sent to WAL journal entry is saved. >> >> Also there is now a single route for commit and rollback cbus > ^^^ move it >> messages, called tx_complete_batch(). This change will come in > ^^^ here Nope. Beforehand there was no a single route, and *now* there is a single route. Which happened to be called tx_complete_batch(). The accent is on *now there is a single route*. Not on *now it is called*. >> hand in scope of synchronous replication, when WAL write won't be >> enough for commit. And therefore 'commit' as a concept should be >> washed away from WAL's code gradually. Migrate to solely txn >> module. >> --- >> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-4842-simplify-wal-rollback >> Issue: https://github.com/tarantool/tarantool/issues/4842 >> >> During working on 4842 I managed to extract this patch from >> Georgy's branch and make it not depending on anything else. This >> is supposed to make some things in WAL simpler before they will >> get more complex because of sync replication. >> >> src/box/wal.c | 178 +++++++++++++++++++++++++++----------------------- >> 1 file changed, 95 insertions(+), 83 deletions(-) >> >> diff --git a/src/box/wal.c b/src/box/wal.c >> index 1eb20272c..b979244e3 100644 >> --- a/src/box/wal.c >> +++ b/src/box/wal.c >> @@ -97,6 +97,13 @@ struct wal_writer >> struct cpipe wal_pipe; >> /** A memory pool for messages. */ >> struct mempool msg_pool; >> + /** >> + * A last journal entry submitted to write. This is a >> + * 'rollback border'. When rollback starts, all >> + * transactions keep being rolled back until this one is >> + * rolled back too. >> + */ >> + struct journal_entry *last_entry; >> /* ----------------- wal ------------------- */ >> /** A setting from instance configuration - wal_max_size */ >> int64_t wal_max_size; >> @@ -153,7 +160,7 @@ struct wal_writer >> * keep adding all incoming requests to the rollback >> * queue, until the tx thread has recovered. >> */ >> - struct cmsg in_rollback; >> + bool is_in_rollback; >> /** >> * WAL watchers, i.e. threads that should be alerted >> * whenever there are new records appended to the journal. >> @@ -198,11 +205,11 @@ static void >> wal_write_to_disk(struct cmsg *msg); >> >> static void >> -tx_schedule_commit(struct cmsg *msg); >> +tx_complete_batch(struct cmsg *msg); >> >> static struct cmsg_hop wal_request_route[] = { >> {wal_write_to_disk, &wal_writer_singleton.tx_prio_pipe}, >> - {tx_schedule_commit, NULL}, >> + {tx_complete_batch, NULL}, >> }; >> >> static void >> @@ -265,14 +272,83 @@ tx_schedule_queue(struct stailq *queue) >> journal_async_complete(&writer->base, req); >> } >> >> +/** >> + * Rollback happens, when disk write fails. In that case all next >> + * transactions, sent to WAL, also should be rolled back. Because >> + * they could make their statements based on what they saw in the >> + * failed transaction. Also rollback of the failed transaction >> + * without rollback of the next ones can actually rewrite what >> + * they committed. >> + * So when rollback is started, *all* pending transactions should >> + * be rolled back. However if they would keep coming, the rollback >> + * would be infinite. This means to complete a rollback it is >> + * necessary to stop sending new transactions to WAL, then >> + * rollback all already sent. In the end allow new transactions >> + * again. >> + * >> + * First step is stop accepting all new transactions. For that WAL >> + * thread sets a global flag. No rocket science here. All new >> + * transactions, if see the flag set, are added to the rollback >> + * queue immediately. >> + * >> + * Second step - tell TX thread to stop sending new transactions >> + * to WAL. So as the rollback queue would stop growing. >> + * >> + * Third step - rollback all transactions in reverse order. >> + * >> + * Fourth step - allow transactions again. Unset the global flag >> + * in WAL thread. >> + */ >> +static inline void >> +wal_begin_rollback(void) >> +{ >> + /* Signal WAL-thread stop accepting new transactions. */ >> + wal_writer_singleton.is_in_rollback = true; >> +} >> + >> +static void >> +wal_complete_rollback(struct cmsg *base) >> +{ >> + (void) base; >> + /* WAL-thread can try writing transactions again. */ >> + wal_writer_singleton.is_in_rollback = false; >> +} >> + >> +static void >> +tx_complete_rollback(void) >> +{ >> + struct wal_writer *writer = &wal_writer_singleton; >> + /* >> + * Despite records are sent in batches, the last entry to >> + * commit can't be in the middle of a batch. After all >> + * transactions to rollback are collected, the last entry >> + * will be exactly, well, the last entry. >> + */ >> + if (stailq_last_entry(&writer->rollback, struct journal_entry, >> + fifo) != writer->last_entry) >> + return; > > I didn't get it: is there can be a batch whose last entry us not > the final one? Nope. Last entry is exactly last entry. If there is a rollback in progress, and there is a batch of transactions returned from WAL thread, it means the last transaction which was sent to WAL is in the end of the batch. If it is not in the end, then this batch is not the last, and there will be more. Since TX thread enters rollback state, it won't send other transactions to WAL thread, and therefore 'last_entry' stop changing. Eventually the batch, which contains the last entry, will return back to TX thread from WAL thread. And the last entry will match the last transaction in the batch. Because if something else would be sent to WAL afterwards, the last_entry member would change again. > You prematurely quit the rollback - is there a guarantee > you'll appeare here again? If the batch does not end with the last_entry, it is not the last batch. So I can't start rollback now. Not all transactions to rollback have returned from WAL thread. There is a guarantee, that if last_entry didn't arrive back from WAL in the current batch, there will be more batches. >> + stailq_reverse(&writer->rollback); >> + tx_schedule_queue(&writer->rollback); >> + /* TX-thread can try sending transactions to WAL again. */ >> + stailq_create(&writer->rollback); >> + static struct cmsg_hop route[] = { >> + {wal_complete_rollback, NULL} >> + }; >> + static struct cmsg msg; >> + cmsg_init(&msg, route); >> + cpipe_push(&writer->wal_pipe, &msg); >> +} I decided to help you with an example of how a typical rollback may look. There are TX thread and WAL thread. Their states in the beginning of the example: TX thread WAL thread mode: normal mode: normal rollback_queue: {} cbus_queue: {} last_entry: null Assume there is transaction txn1. txn_commit(txn1) is called. TX thread sends it to WAL. TX thread WAL thread mode: normal mode: normal rollback_queue: {} cbus_queue: {txn1} last_entry: txn1 Then txn2, txn3 are committed, they go to WAL in a second batch. TX thread WAL thread mode: normal mode: normal rollback_queue: {} cbus_queue: {txn2, txn3} -> {txn1} last_entry: txn3 Now WAL thread pops {txn1} (batch of a single transaction), and tries to write it. But it fails. Then WAL enters rollback mode and sends {txn1} back as rolled back. TX thread WAL thread mode: rollback mode: rollback rollback_queue: {txn1} cbus_queue: {txn2, txn3} last_entry: txn3 TX thread receives {txn1} as rolled back, so it enters 'rollback' state too. Also TX thread sees that {txn1} does not end with last_entry, which is txn3, so there is at least one more batch to wait from WAL before rollback can be done. It waits. Assume now arrives transaction txn4. TX thread is in rollback mode, so an attempt to commit txn4 makes it fail immediately. It is totally legal. Rollback order is respected. txn4 is just rolled back and removed. No need to add it to any queue. Now WAL thread processes next batch. WAL thread is still in rollback state, so it returns {txn2, txn3} back to TX thread right away. TX thread WAL thread mode: rollback mode: rollback rollback_queue: {txn1} -> {txn2, txn3} cbus_queue: {} last_entry: txn3 TX thread sees, that this batch ({txn2, txn3}) ends with last_entry, and now it is sure there are no more batches in WAL thread. Indeed, all transactions after last_entry were rolled back immediately, without going to WAL. So it rolls back transactions in the queue in order txn3, txn2, txn1. Without yields. And enters normal state. TX thread WAL thread mode: normal mode: rollback rollback_queue: {} cbus_queue: {} last_entry: txn3 Note, WAL thread is still in rollback state. But this is ok, because right after rolling back the queue, TX thread sends a message to WAL thread saying "all is ok now, you can try writing to disk again". Newer transactions won't be able to arrive to WAL thread earlier, because in cbus there is strict order of messages. TX thread WAL thread mode: normal mode: normal rollback_queue: {} cbus_queue: {} last_entry: txn3 Now rollback is finished. I hope this example helps.
next prev parent reply other threads:[~2020-05-06 21:43 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-30 22:50 Vladislav Shpilevoy 2020-05-03 10:46 ` Konstantin Osipov 2020-05-03 16:45 ` Vladislav Shpilevoy 2020-05-03 19:30 ` Konstantin Osipov 2020-05-06 7:55 ` Cyrill Gorcunov 2020-05-06 15:38 ` Sergey Ostanevich 2020-05-06 21:43 ` Vladislav Shpilevoy [this message] 2020-05-07 10:28 ` Sergey Ostanevich 2020-05-07 21:37 ` Vladislav Shpilevoy 2020-05-08 8:12 ` Kirill Yukhin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=9a29772d-0ade-5735-adc1-536d4e76a1f1@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=sergos@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 1/1] wal: simplify rollback' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox