[Tarantool-patches] [PATCH 1/1] wal: simplify rollback

Cyrill Gorcunov gorcunov at gmail.com
Wed May 6 10:55:22 MSK 2020


On Fri, May 01, 2020 at 12:50:57AM +0200, Vladislav Shpilevoy wrote:
> From: Georgy Kirichenko <georgy at tarantool.org>
> 
> Here is a summary on how and when rollback works in WAL.
> 
> Rollback happens, when disk write fails. In that case the failed
> and all next transactions, sent to WAL, should be rolled back.
> Together. Following transactions should be rolled back too,
> because they could make their statements based on what they saw in
> the failed transaction. Also rollback of the failed transaction
> without rollback of the next ones can actually rewrite what they
> committed.
> 
> So when rollback is started, *all* pending transactions should be
> rolled back. However if they would keep coming, the rollback would
> be infinite. This means to complete a rollback it is necessary to
> stop sending new transactions to WAL, then rollback all already
> sent. In the end allow new transactions again.
> 
> Step-by-step:
> 
> 1) stop accepting all new transactions in WAL thread, where
> rollback is started. All new transactions don't even try to go to
> disk. They added to rollback queue immediately after arriving to
> WAL thread.
> 
> 2) tell TX thread to stop sending new transactions to WAL. So as
> the rollback queue would stop growing.
> 
> 3) rollback all transactions in reverse order.
> 
> 4) allow transactions again in WAL thread and TX thread.
> 
> The algorithm is long, but simple and understandable. However
> implementation wasn't so easy. It was done using a 4-hop cbus
> route. 2 hops of which were supposed to clear cbus channel from
> all other cbus messages. Next two hops implemented steps 3 and 4.
> Rollback state of the WAL was signaled by checking internals of a
> preallocated cbus message.
> 
> The patch makes it simpler and more straightforward. Rollback
> state is now signaled by a simple flag, and there is no a hack
> about clearing cbus channel, no touching attributes of a cbus
> message. The moment when all transactions are stopped and the last
> one has returned from WAL is visible explicitly, because the last
> sent to WAL journal entry is saved.
> 
> Also there is now a single route for commit and rollback cbus
> messages, called tx_complete_batch(). This change will come in
> hand in scope of synchronous replication, when WAL write won't be
> enough for commit. And therefore 'commit' as a concept should be
> washed away from WAL's code gradually. Migrate to solely txn
> module.
> ---
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-4842-simplify-wal-rollback
> Issue: https://github.com/tarantool/tarantool/issues/4842
> 
> During working on 4842 I managed to extract this patch from
> Georgy's branch and make it not depending on anything else. This
> is supposed to make some things in WAL simpler before they will
> get more complex because of sync replication.

I don't see any obvious errors in the patch itself. Still I'm far
from being a wal expert so might miss something in big picture, thus
Reviewed-by: Cyrill Gorcunov <gorcunov at gmail.com>


More information about the Tarantool-patches mailing list