From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 765766EC58; Tue, 25 May 2021 13:40:39 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 765766EC58 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1621939239; bh=kz/K8h93jb6CDm2focX8mJSoABF7mG57iQDStwF8kAQ=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=fODVpwHJkIiQsNOfsZHmgM7RARo83gqdLlPt43/dY4tnWiXPPbvKppW8uGeDWek0C fNCLXBJ0rOxy2OIW2mtkKTIrvgks8kIZ95ItaK11eX+s9PxUT4fJ5KAY2EQjPr8EcL IpGIbATwzO96WWROen5dsdH5V3AMbfmnCapBGPn8= Received: from smtp35.i.mail.ru (smtp35.i.mail.ru [94.100.177.95]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id B651B6F3C2 for ; Tue, 25 May 2021 13:39:35 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B651B6F3C2 Received: by smtp35.i.mail.ru with esmtpa (envelope-from ) id 1llUTG-0002vj-L0; Tue, 25 May 2021 13:39:35 +0300 To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Date: Tue, 25 May 2021 13:39:29 +0300 Message-Id: <8011f87bb9b5e1f53f5bee3124f3a8e9dbe1917c.1621935783.git.sergepetrenko@tarantool.org> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD91B019B01C53E51AF6F63A46F26EE68FD6CF19A7203B0589400894C459B0CD1B905C48041216E46D7C8A20AE26095BDD10BC1D0E047EE27F5485234136DCFB6CD X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7182B22A52F94F7DDEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637F898CA578D17CA188638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8E843DA11DDE6F774A9FC5C26A3E4F576117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735209ECD01F8117BC8BEA471835C12D1D977C4224003CC8364762BB6847A3DEAEFB0F43C7A68FF6260569E8FC8737B5C2249EC8D19AE6D49635B68655334FD4449CB9ECD01F8117BC8BEAAAE862A0553A39223F8577A6DFFEA7C747589E6AAA3516243847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8183A4AFAF3EA6BDC44C234C8B12C006B7AFA571FEEF611BB93C7994BCE56D29B9B87055DBBF6C483A6B1881A6453793CE9C32612AADDFBE061C801D989C91DAA47C32612AADDFBE0618C81B5738AFAA7AF9510FB958DCE06DB6ED91DBE5ABE359AC8952F428387DEC02272C4C079A4C8AD93EDB24507CE13387DFF0A840B692CF8 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34103A2CD2CFC42C67C8672C6C5745C0991348795D50995477B03FF8EA5DA9471FC1DE5616E021EE7F1D7E09C32AA3244C9BC40FDA4C3AFA13901532AF094A0544B038C9161EF167A1927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojV3CWresp27e+fxCmSQjv5g== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A446C5C3ACC23E232AA62A2653F9BDC775FE672B1FA5F7FA4D7C424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH v2 2/2] box: fix an assertion failure in box.ctl.promote() X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" box.ctl.promote() used to assume that the last synchronous entry is already written to WAL by the time it's called. This is not the case when promote is executed on the limbo owner. The last synchronous entry might still be en route to WAL. In order to fix the issue, wait until all the limbo entries are written to disk via wal_sync(). After this happens, it's safe to proceed to gathering quorum in promote. Closes #6032 --- src/box/box.cc | 27 ++++++-- .../gh-6032-promote-wal-write.result | 69 +++++++++++++++++++ .../gh-6032-promote-wal-write.test.lua | 28 ++++++++ test/replication/suite.cfg | 1 + test/replication/suite.ini | 2 +- 5 files changed, 120 insertions(+), 7 deletions(-) create mode 100644 test/replication/gh-6032-promote-wal-write.result create mode 100644 test/replication/gh-6032-promote-wal-write.test.lua diff --git a/src/box/box.cc b/src/box/box.cc index 894e3d0f4..3d9cd0e57 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1618,14 +1618,29 @@ box_promote(void) txn_limbo.owner_id); return -1; } + if (txn_limbo_is_empty(&txn_limbo)) { + wait_lsn = txn_limbo.confirmed_lsn; + goto promote; + } } - /* - * promote() is a no-op on the limbo owner, so all the rows - * in the limbo must've come through the applier meaning they already - * have an lsn assigned, even if their WAL write hasn't finished yet. - */ - wait_lsn = txn_limbo_last_synchro_entry(&txn_limbo)->lsn; + struct txn_limbo_entry *last_entry; + last_entry = txn_limbo_last_synchro_entry(&txn_limbo); + /* Wait for the last entries WAL write. */ + if (last_entry->lsn < 0) { + if (wal_sync(NULL) < 0) + return -1; + if (txn_limbo_is_empty(&txn_limbo)) { + wait_lsn = txn_limbo.confirmed_lsn; + goto promote; + } + if (last_entry != txn_limbo_last_synchro_entry(&txn_limbo)) { + diag_set(ClientError, ER_QUORUM_WAIT, quorum, + "new synchronous transactions appeared"); + return -1; + } + } + wait_lsn = last_entry->lsn; assert(wait_lsn > 0); rc = box_wait_quorum(former_leader_id, wait_lsn, quorum, diff --git a/test/replication/gh-6032-promote-wal-write.result b/test/replication/gh-6032-promote-wal-write.result new file mode 100644 index 000000000..246c7974f --- /dev/null +++ b/test/replication/gh-6032-promote-wal-write.result @@ -0,0 +1,69 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... +fiber = require('fiber') + | --- + | ... + +replication_synchro_timeout = box.cfg.replication_synchro_timeout + | --- + | ... +box.cfg{\ + replication_synchro_timeout = 0.001,\ +} + | --- + | ... + +_ = box.schema.create_space('sync', {is_sync = true}):create_index('pk') + | --- + | ... + +box.error.injection.set('ERRINJ_WAL_DELAY', true) + | --- + | - ok + | ... +_ = fiber.create(function() box.space.sync:replace{1} end) + | --- + | ... +ok, err = nil, nil + | --- + | ... + +-- Test that the fiber actually waits for a WAL write to happen. +f = fiber.create(function() ok, err = pcall(box.ctl.promote) end) + | --- + | ... +fiber.sleep(0.1) + | --- + | ... +f:status() + | --- + | - suspended + | ... +box.error.injection.set('ERRINJ_WAL_DELAY', false) + | --- + | - ok + | ... +test_run:wait_cond(function() return f:status() == 'dead' end) + | --- + | - true + | ... +ok + | --- + | - true + | ... +err + | --- + | - null + | ... + +-- Cleanup. +box.cfg{\ + replication_synchro_timeout = replication_synchro_timeout,\ +} + | --- + | ... +box.space.sync:drop() + | --- + | ... diff --git a/test/replication/gh-6032-promote-wal-write.test.lua b/test/replication/gh-6032-promote-wal-write.test.lua new file mode 100644 index 000000000..8c1859083 --- /dev/null +++ b/test/replication/gh-6032-promote-wal-write.test.lua @@ -0,0 +1,28 @@ +test_run = require('test_run').new() +fiber = require('fiber') + +replication_synchro_timeout = box.cfg.replication_synchro_timeout +box.cfg{\ + replication_synchro_timeout = 0.001,\ +} + +_ = box.schema.create_space('sync', {is_sync = true}):create_index('pk') + +box.error.injection.set('ERRINJ_WAL_DELAY', true) +_ = fiber.create(function() box.space.sync:replace{1} end) +ok, err = nil, nil + +-- Test that the fiber actually waits for a WAL write to happen. +f = fiber.create(function() ok, err = pcall(box.ctl.promote) end) +fiber.sleep(0.1) +f:status() +box.error.injection.set('ERRINJ_WAL_DELAY', false) +test_run:wait_cond(function() return f:status() == 'dead' end) +ok +err + +-- Cleanup. +box.cfg{\ + replication_synchro_timeout = replication_synchro_timeout,\ +} +box.space.sync:drop() diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index dc39e2f74..dfe4be9ae 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -45,6 +45,7 @@ "gh-5435-qsync-clear-synchro-queue-commit-all.test.lua": {}, "gh-5536-wal-limit.test.lua": {}, "gh-5566-final-join-synchro.test.lua": {}, + "gh-6032-promote-wal-write.test.lua": {}, "*": { "memtx": {"engine": "memtx"}, "vinyl": {"engine": "vinyl"} diff --git a/test/replication/suite.ini b/test/replication/suite.ini index 1d9c0a4ae..2625c5eea 100644 --- a/test/replication/suite.ini +++ b/test/replication/suite.ini @@ -3,7 +3,7 @@ core = tarantool script = master.lua description = tarantool/box, replication disabled = consistent.test.lua -release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua +release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6032-promote-wal-write.test.lua config = suite.cfg lua_libs = lua/fast_replica.lua lua/rlimit.lua use_unix_sockets = True -- 2.30.1 (Apple Git-130)