From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 87A046EC58; Fri, 28 May 2021 12:01:20 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 87A046EC58 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1622192480; bh=T55TPEz80cYsFwpXVQaaQeT5U6MaLgBYFfHP3vs38Kc=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=tF72dVrq7bKnayzehzGgCHhXTiHLn2wsiOqTkSk+BPmv7wHjA7Og3nATMbhbRT3O3 val4/o/L7d1qsCrfZRy1NWGHTIR+kQugWxBuJxkVKZ1l+hJyefcTSHOCLQ6QM0htB4 SBX299R1TMmD14Jh/BpCwB72JYgCcJCSsk/rmAa0= Received: from smtp51.i.mail.ru (smtp51.i.mail.ru [94.100.177.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 744CF6EC58 for ; Fri, 28 May 2021 12:01:18 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 744CF6EC58 Received: by smtp51.i.mail.ru with esmtpa (envelope-from ) id 1lmYMn-0005Kw-JS; Fri, 28 May 2021 12:01:18 +0300 To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, gorcunov@gmail.com References: Message-ID: <3dc7734c-a572-7f69-a6b8-ad5aa7536873@tarantool.org> Date: Fri, 28 May 2021 12:01:17 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD9157EECD0FDB90B9A3828624E6E8B1F07482114E64581541200894C459B0CD1B90786915CA1FCA8F486E7866ED362F7091D62CF474F4FEC719B0C14B1D048C08F X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE765EC468FF5F3030AEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637578F58D66D7052C48638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8B7E59D01DDF32C0BEB0EF0D67D54C17F117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCE2CCD8F0CAA010FB389733CBF5DBD5E9C8A9BA7A39EFB766F5D81C698A659EA7CC7F00164DA146DA9985D098DBDEAEC8B4B51A2BAB7FBE05117882F4460429728AD0CFFFB425014E868A13BD56FB6657D81D268191BDAD3DC09775C1D3CA48CF40B429B92CE9FF2BBA3038C0950A5D36C8A9BA7A39EFB766EC990983EF5C0329BA3038C0950A5D36D5E8D9A59859A8B6AC37AC2B1429C73A76E601842F6C81A1F004C906525384307823802FF610243DF43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7CDDB9BF3B882869D543847C11F186F3C59DAA53EE0834AAEE X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A24209795067102C07E8F7B195E1C978317A923EB9EED9D9D357D12FD7D753A721 X-C1DE0DAB: 0D63561A33F958A571D628719D94BFA62989B23C6CFD83FFF37B9DAE311BC553D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75E3127721F5A72C97410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3458239C6F30CE7ACF3FD12BEB56393C47B6695EE859AFCB2DABCDDB5748959EC08F4E2CD09FBB2C631D7E09C32AA3244CC6D687D3AA69D8035823B8719C11DF4EFE8DA44ABE2443F7FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojywAFAsvjBJi3Z2uqAHXH8w== X-Mailru-Sender: 583F1D7ACE8F49BD9DF7A8DAE6E2B08A42852A6BD45DF3AB7E48D183D0EA02D43120E6DF4C999AC9424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 1/1] qsync: handle async txns right during CONFIRM X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 28.05.2021 00:28, Vladislav Shpilevoy пишет: > It is possible that a new async transaction is added to the limbo > when there is an in-progress CONFIRM WAL write for all the pending > sync transactions. > > Then when CONFIRM WAL write is done, it might see that the limbo > now in the first place contains an async transaction not yet > written to WAL. A suspicious situation - on one hand the async > transaction does not have any blocking sync txns before it and > can be considered complete, on the other hand its WAL write is not > done and it is not complete. > > Before this patch it resulted into a crash - limbo didn't consider > the situation possible at all. > > Now when CONFIRM covers a not yet written async transactions, they > are removed from the limbo and are turned to plain transactions. > > When their WAL write is done, they see they no more have > TXN_WAIT_SYNC flag and don't even need to interact with the limbo. > > It is important to remove them from the limbo right when the > CONFIRM is done. Because otherwise their limbo entry may be not > removed at all when it is done on a replica. On a replica the > limbo entries are removed only by CONFIRM/ROLLBACK/PROMOTE. If > there would be an async transaction in the first position in the > limbo queue, it wouldn't be deleted until next sync transaction > appears. > > This replica case is not possible now though. Because all synchro > entries on the applier are written in a blocking way. Nonetheless > if it ever becomes non-blocking, the code should handle it ok. > > Closes #6057 Hi! Thanks for working on this and for the fast fix! Please find one comment below. > --- > Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-6057-confirm-async-no-wal > Issue: https://github.com/tarantool/tarantool/issues/6057 > > .../gh-6057-qsync-confirm-async-no-wal.md | 5 + > src/box/txn.c | 14 +- > src/box/txn_limbo.c | 21 +++ > .../gh-6057-qsync-confirm-async-no-wal.result | 163 ++++++++++++++++++ > ...h-6057-qsync-confirm-async-no-wal.test.lua | 88 ++++++++++ > test/replication/suite.cfg | 1 + > test/replication/suite.ini | 2 +- > 7 files changed, 289 insertions(+), 5 deletions(-) > create mode 100644 changelogs/unreleased/gh-6057-qsync-confirm-async-no-wal.md > create mode 100644 test/replication/gh-6057-qsync-confirm-async-no-wal.result > create mode 100644 test/replication/gh-6057-qsync-confirm-async-no-wal.test.lua > > diff --git a/changelogs/unreleased/gh-6057-qsync-confirm-async-no-wal.md b/changelogs/unreleased/gh-6057-qsync-confirm-async-no-wal.md > new file mode 100644 > index 000000000..1005389d8 > --- /dev/null > +++ b/changelogs/unreleased/gh-6057-qsync-confirm-async-no-wal.md > @@ -0,0 +1,5 @@ > +## bugfix/replication > + > +* Fixed a possible crash when a synchronous transaction was followed by an > + asynchronous transaction right when its confirmation was being written > + (gh-6057). > diff --git a/src/box/txn.c b/src/box/txn.c > index 1d42c9113..3d4d5c397 100644 > --- a/src/box/txn.c > +++ b/src/box/txn.c > @@ -880,8 +880,14 @@ txn_commit(struct txn *txn) > if (req == NULL) > goto rollback; > > - bool is_sync = txn_has_flag(txn, TXN_WAIT_SYNC); > - if (is_sync) { > + /* > + * Do not cash the flag value in a variable. The flag might be deleted > + * during WAL write. This can happen for async transactions created > + * during CONFIRM write, whose all blocking sync transactions get > + * confirmed. They they turn the async transaction into just a plain > + * txn not waiting for anything. > + */ > + if (txn_has_flag(txn, TXN_WAIT_SYNC)) { > /* > * Remote rows, if any, come before local rows, so > * check for originating instance id here. > @@ -900,13 +906,13 @@ txn_commit(struct txn *txn) > > fiber_set_txn(fiber(), NULL); > if (journal_write(req) != 0 || req->res < 0) { > - if (is_sync) > + if (txn_has_flag(txn, TXN_WAIT_SYNC)) > txn_limbo_abort(&txn_limbo, limbo_entry); > diag_set(ClientError, ER_WAL_IO); > diag_log(); > goto rollback; > } > - if (is_sync) { > + if (txn_has_flag(txn, TXN_WAIT_SYNC)) { > if (txn_has_flag(txn, TXN_WAIT_ACK)) { > int64_t lsn = req->rows[req->n_rows - 1]->lsn; > /* > diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c > index f287369a2..05f0bf30a 100644 > --- a/src/box/txn_limbo.c > +++ b/src/box/txn_limbo.c > @@ -389,6 +389,27 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn) > */ > if (e->lsn == -1) > break; > + } else if (e->txn->signature < 0) { > + /* > + * A transaction might be covered by the CONFIRM even if > + * it is not written to WAL yet when it is an async > + * transaction. It could be created just when the > + * CONFIRM was being written to WAL. > + */ > + assert(e->txn->status == TXN_PREPARED); > + /* > + * Let it complete normally as a plain transaction. > + */ > + txn_clear_flags(e->txn, TXN_WAIT_SYNC | TXN_WAIT_ACK); AFAICS it's enough to clear WAIT_SYNC here. Asynchronous transactions never have WAIT_ACK set, do they? > + txn_limbo_remove(limbo, e); > + /* > + * The limbo entry now should not be used by the owner > + * transaction since it just became a plain one. Nullify > + * the txn to get a crash on any usage attempt instead > + * of potential undefined behaviour. > + */ > + e->txn = NULL; > + continue; > } > e->is_commit = true; > txn_limbo_remove(limbo, e); -- Serge Petrenko