From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 3C7D46EC56; Sat, 12 Jun 2021 00:59:12 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 3C7D46EC56 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1623448752; bh=nGFI+DJAA6Qygob82ZVyKB+39f7NioG6hRmUnVAIyT8=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=r/UR4xrodHs40/IfYe4r7TAAYEyLh7jqtjsNjTx8zwRV0p0A6+XqcG4LW43IeXvkm 9S+IWBhcYI1/bWVC2Ww1PXLx80/J8e7aDcAFjN3VVNbtte2xdKbggHawJRmnvjbGwh cHNA0Ls4I1tsmSgZhl0biVNOC0tgCeOXg3tzhnPM= Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5FBC36EC56 for ; Sat, 12 Jun 2021 00:56:25 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 5FBC36EC56 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1lrp8a-0008JK-I1; Sat, 12 Jun 2021 00:56:25 +0300 To: tarantool-patches@dev.tarantool.org, gorcunov@gmail.com, sergepetrenko@tarantool.org Date: Fri, 11 Jun 2021 23:56:09 +0200 Message-Id: <6ef0d84375576f94283e2bf1fea56a55daf9523e.1623448465.git.v.shpilevoy@tarantool.org> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 78E4E2B564C1792B X-77F55803: 4F1203BC0FB41BD9D5B0DA836B685C5407454A95E60932C8E3171F0D0805CD56182A05F538085040107EB9934B22618E97CA7BBA58A7C9EE8FA25FEB1361AB690CA96822A2E3DB37 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7F15DAA5020FE7F8CEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637D51A55327EA515C2EA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38BD6CF32B5F8F9D4049BF8F61210B07AF32CD2EBF791F2584ACC7F00164DA146DAFE8445B8C89999728AA50765F7900637F6B57BC7E64490618DEB871D839B7333395957E7521B51C2DFABB839C843B9C08941B15DA834481F8AA50765F7900637F6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA7E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8B9F5955FECEF5819E75ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A2AD77751E876CB595E8F7B195E1C978317387390EF8EF12407E6ED54993E2D80B X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8183A4AFAF3EA6BDC44E1F4276B809941968874A2D3FA5089FE53BEA5E6B4C7447A0960F99027C1AB199C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EF31C0090ACECF247D699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3475FE4AA98865E2350E38CBF9B77BF587714251E54694547EC751DF9B14A5FB2243AC953ADA6B1FA31D7E09C32AA3244C8BC90C23531E6DABDB69DA628FA6A0D533C9DC155518937FFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojQR1NM653rVHrn9JUuCoOQQ== X-Mailru-Sender: 689FA8AB762F73936BC43F508A06382252EB9C3C11D69AB95BAE60C9E591735A3841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: [Tarantool-patches] [PATCH 13/13] txn: stop TXN_SIGNATURE_ABORT override X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" When txn_commit/try_async() failed before going to WAL thread, they installed TXN_SIGNATURE_ABORT signature meaning that the caller and the rollback triggers must look at the global diag. But they called txn_rollback() before doing return and calling the triggers, which overrode the signature with TXN_SIGNATURE_ROLLBACK leading to the original error loss. The patch makes TXN_SIGNATURE_ROLLBACK installed only when a real rollback happens (via box_txn_rollback()). This makes the original commit errors like a conflict in the transaction manager and OOM not lost. Besides, ERRINJ_TXN_COMMIT_ASYNC does not need its own diag_log() anymore. Because since this commit the applier logs the correct error instead of ER_WAL_IO/ER_TXN_ROLLBACK. Closes #6027 --- .../unreleased/gh-6027-applier-lost-error.md | 7 ++ src/box/txn.c | 10 ++- .../gh-6027-applier-error-show.result | 82 +++++++++++++++++++ .../gh-6027-applier-error-show.test.lua | 31 +++++++ test/replication/suite.cfg | 1 + test/replication/suite.ini | 2 +- 6 files changed, 129 insertions(+), 4 deletions(-) create mode 100644 changelogs/unreleased/gh-6027-applier-lost-error.md create mode 100644 test/replication/gh-6027-applier-error-show.result create mode 100644 test/replication/gh-6027-applier-error-show.test.lua diff --git a/changelogs/unreleased/gh-6027-applier-lost-error.md b/changelogs/unreleased/gh-6027-applier-lost-error.md new file mode 100644 index 000000000..9c765b8e2 --- /dev/null +++ b/changelogs/unreleased/gh-6027-applier-lost-error.md @@ -0,0 +1,7 @@ +## bugfix/replication + +* When an error happened during appliance of a transaction received from a + remote instance via replication, it was always reported as "Failed to write + to disk" regardless of what really happened. Now the correct error is shown. + For example, "Out of memory", or "Transaction has been aborted by conflict", + and so on (gh-6027). diff --git a/src/box/txn.c b/src/box/txn.c index 5cae7b41d..c2734c237 100644 --- a/src/box/txn.c +++ b/src/box/txn.c @@ -805,7 +805,6 @@ txn_commit_try_async(struct txn *txn) * Log it for the testing sake: we grep * output to mark this event. */ - diag_log(); goto rollback; }); @@ -983,11 +982,11 @@ void txn_rollback(struct txn *txn) { assert(txn == in_txn()); + assert(txn->signature != TXN_SIGNATURE_UNKNOWN); txn->status = TXN_ABORTED; trigger_clear(&txn->fiber_on_stop); if (!txn_has_flag(txn, TXN_CAN_YIELD)) trigger_clear(&txn->fiber_on_yield); - txn->signature = TXN_SIGNATURE_ROLLBACK; txn_complete_fail(txn); fiber_set_txn(fiber(), NULL); } @@ -1086,6 +1085,8 @@ box_txn_rollback(void) diag_set(ClientError, ER_ROLLBACK_IN_SUB_STMT); return -1; } + assert(txn->signature == TXN_SIGNATURE_UNKNOWN); + txn->signature = TXN_SIGNATURE_ROLLBACK; txn_rollback(txn); /* doesn't throw */ fiber_gc(); return 0; @@ -1221,7 +1222,10 @@ txn_on_stop(struct trigger *trigger, void *event) { (void) trigger; (void) event; - txn_rollback(in_txn()); /* doesn't yield or fail */ + struct txn *txn = in_txn(); + assert(txn->signature == TXN_SIGNATURE_UNKNOWN); + txn->signature = TXN_SIGNATURE_ROLLBACK; + txn_rollback(txn); fiber_gc(); return 0; } diff --git a/test/replication/gh-6027-applier-error-show.result b/test/replication/gh-6027-applier-error-show.result new file mode 100644 index 000000000..c3a01ab50 --- /dev/null +++ b/test/replication/gh-6027-applier-error-show.result @@ -0,0 +1,82 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... + +-- +-- gh-6027: on attempt to a commit transaction its original error was lost. +-- + +box.schema.user.grant('guest', 'super') + | --- + | ... +s = box.schema.create_space('test') + | --- + | ... +_ = s:create_index('pk') + | --- + | ... + +test_run:cmd('create server replica with rpl_master=default, '.. \ + 'script="replication/replica.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica') + | --- + | - true + | ... + +test_run:switch('replica') + | --- + | - true + | ... +box.error.injection.set('ERRINJ_TXN_COMMIT_ASYNC', true) + | --- + | - ok + | ... + +test_run:switch('default') + | --- + | - true + | ... +_ = s:replace{1} + | --- + | ... + +test_run:switch('replica') + | --- + | - true + | ... +test_run:wait_upstream(1, {status = 'stopped'}) + | --- + | - true + | ... +-- Should be something about error injection. +box.info.replication[1].upstream.message + | --- + | - Error injection 'txn commit async injection' + | ... + +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server replica') + | --- + | - true + | ... +test_run:cmd('delete server replica') + | --- + | - true + | ... +box.error.injection.set('ERRINJ_TXN_COMMIT_ASYNC', false) + | --- + | - ok + | ... +s:drop() + | --- + | ... +box.schema.user.revoke('guest', 'super') + | --- + | ... diff --git a/test/replication/gh-6027-applier-error-show.test.lua b/test/replication/gh-6027-applier-error-show.test.lua new file mode 100644 index 000000000..8e17cdfa9 --- /dev/null +++ b/test/replication/gh-6027-applier-error-show.test.lua @@ -0,0 +1,31 @@ +test_run = require('test_run').new() + +-- +-- gh-6027: on attempt to a commit transaction its original error was lost. +-- + +box.schema.user.grant('guest', 'super') +s = box.schema.create_space('test') +_ = s:create_index('pk') + +test_run:cmd('create server replica with rpl_master=default, '.. \ + 'script="replication/replica.lua"') +test_run:cmd('start server replica') + +test_run:switch('replica') +box.error.injection.set('ERRINJ_TXN_COMMIT_ASYNC', true) + +test_run:switch('default') +_ = s:replace{1} + +test_run:switch('replica') +test_run:wait_upstream(1, {status = 'stopped'}) +-- Should be something about error injection. +box.info.replication[1].upstream.message + +test_run:switch('default') +test_run:cmd('stop server replica') +test_run:cmd('delete server replica') +box.error.injection.set('ERRINJ_TXN_COMMIT_ASYNC', false) +s:drop() +box.schema.user.revoke('guest', 'super') diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 9eccf9e19..3a0a8649f 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -45,6 +45,7 @@ "gh-5536-wal-limit.test.lua": {}, "gh-5566-final-join-synchro.test.lua": {}, "gh-5613-bootstrap-prefer-booted.test.lua": {}, + "gh-6027-applier-error-show.test.lua": {}, "gh-6032-promote-wal-write.test.lua": {}, "gh-6057-qsync-confirm-async-no-wal.test.lua": {}, "gh-6094-rs-uuid-mismatch.test.lua": {}, diff --git a/test/replication/suite.ini b/test/replication/suite.ini index 80e968d56..ccf3559df 100644 --- a/test/replication/suite.ini +++ b/test/replication/suite.ini @@ -3,7 +3,7 @@ core = tarantool script = master.lua description = tarantool/box, replication disabled = consistent.test.lua -release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua +release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua config = suite.cfg lua_libs = lua/fast_replica.lua lua/rlimit.lua use_unix_sockets = True -- 2.24.3 (Apple Git-128)