From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com [209.85.208.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 089BF469710 for ; Wed, 6 May 2020 10:55:25 +0300 (MSK) Received: by mail-lj1-f194.google.com with SMTP id e25so1317514ljg.5 for ; Wed, 06 May 2020 00:55:25 -0700 (PDT) Date: Wed, 6 May 2020 10:55:22 +0300 From: Cyrill Gorcunov Message-ID: <20200506075522.GB51428@grain> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH 1/1] wal: simplify rollback List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org On Fri, May 01, 2020 at 12:50:57AM +0200, Vladislav Shpilevoy wrote: > From: Georgy Kirichenko > > Here is a summary on how and when rollback works in WAL. > > Rollback happens, when disk write fails. In that case the failed > and all next transactions, sent to WAL, should be rolled back. > Together. Following transactions should be rolled back too, > because they could make their statements based on what they saw in > the failed transaction. Also rollback of the failed transaction > without rollback of the next ones can actually rewrite what they > committed. > > So when rollback is started, *all* pending transactions should be > rolled back. However if they would keep coming, the rollback would > be infinite. This means to complete a rollback it is necessary to > stop sending new transactions to WAL, then rollback all already > sent. In the end allow new transactions again. > > Step-by-step: > > 1) stop accepting all new transactions in WAL thread, where > rollback is started. All new transactions don't even try to go to > disk. They added to rollback queue immediately after arriving to > WAL thread. > > 2) tell TX thread to stop sending new transactions to WAL. So as > the rollback queue would stop growing. > > 3) rollback all transactions in reverse order. > > 4) allow transactions again in WAL thread and TX thread. > > The algorithm is long, but simple and understandable. However > implementation wasn't so easy. It was done using a 4-hop cbus > route. 2 hops of which were supposed to clear cbus channel from > all other cbus messages. Next two hops implemented steps 3 and 4. > Rollback state of the WAL was signaled by checking internals of a > preallocated cbus message. > > The patch makes it simpler and more straightforward. Rollback > state is now signaled by a simple flag, and there is no a hack > about clearing cbus channel, no touching attributes of a cbus > message. The moment when all transactions are stopped and the last > one has returned from WAL is visible explicitly, because the last > sent to WAL journal entry is saved. > > Also there is now a single route for commit and rollback cbus > messages, called tx_complete_batch(). This change will come in > hand in scope of synchronous replication, when WAL write won't be > enough for commit. And therefore 'commit' as a concept should be > washed away from WAL's code gradually. Migrate to solely txn > module. > --- > Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-4842-simplify-wal-rollback > Issue: https://github.com/tarantool/tarantool/issues/4842 > > During working on 4842 I managed to extract this patch from > Georgy's branch and make it not depending on anything else. This > is supposed to make some things in WAL simpler before they will > get more complex because of sync replication. I don't see any obvious errors in the patch itself. Still I'm far from being a wal expert so might miss something in big picture, thus Reviewed-by: Cyrill Gorcunov