From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id F0E5570CB5; Mon, 11 Oct 2021 22:16:40 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org F0E5570CB5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1633979801; bh=gO6m5HzqNHig+BgIzAFgRkgBEQDHcdY9KUis7RuqPeE=; h=To:Date:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=r6+rEywWrWFT27n5qsVmVvGAHS6MLyOygycYU13MAXJMPzx/p3pr2rI6MzDkAItKo DmJSPOY0QluNmI0L5dg+etaVvl/I0X9s+RG7eN7vV3s7iPe0tU+uwoO7bgCK54HEba Ic3mrA/6hSEiS7ZI5ao4dTUZnmGjEdbYjZ3lKTS8= Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5EBDE70CB3 for ; Mon, 11 Oct 2021 22:16:39 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 5EBDE70CB3 Received: by mail-lf1-f54.google.com with SMTP id i24so75830800lfj.13 for ; Mon, 11 Oct 2021 12:16:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=GHiiTZ3dK4FNqTemKEklRNES2v9hSRcQDS/GSWHaggQ=; b=1qX2tOk3Q/egpXSZg/pxDyGHA/d3Tg1nSq5qYurjBcI3vsh2y/P3TkjnUWn6I1uZeX cE+hXACo6rs9j0L13Ibv5L6Dmdb8aR/CdKHcZiMUAjsmp5n73iyJihLbyhjUiVBPGmAN u/yMwPG0rQU/7JGqYWuvBlc2Ent7BOQwOxZNFkWMi2ls5v/Bj8tZJ9FiYUPL2U2hIW+x w14U/gLGE/S1yVm+jqx3PKQwYucILHPfL1c+ChjkXTSMRcjG/Gi1573vWjmsMoY9dwLs 6BpOh8FbuZjQVTiutnw1GU7Q5gGcKVEPbt5mwYUyoZaOjzHvuDaOZSZlpnLprc72KDX8 vh0A== X-Gm-Message-State: AOAM530VLZc44cdH/dDEOLpRpGWHz4c1/s50VVgQB3SQV9vK5TSlR8r1 9JSFXrzYNm/XIZ73OiO+kFOmkdkcSRE= X-Google-Smtp-Source: ABdhPJyxR+BXD4fuD5jpNnvViAtiMx9bLeDMD7B4FH6G1AqY+kUyz3MuPDZTKfPe3AyiuWcbufF2Sw== X-Received: by 2002:a05:6512:2346:: with SMTP id p6mr28549495lfu.214.1633979798134; Mon, 11 Oct 2021 12:16:38 -0700 (PDT) Received: from grain.localdomain ([5.18.253.97]) by smtp.gmail.com with ESMTPSA id x4sm324637lfq.246.2021.10.11.12.16.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Oct 2021 12:16:37 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id 4FE6A5A0020; Mon, 11 Oct 2021 22:16:36 +0300 (MSK) To: tml Date: Mon, 11 Oct 2021 22:16:32 +0300 Message-Id: <20211011191635.573685-1-gorcunov@gmail.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH v22 0/3] qsync: implement packet filtering (part 1) X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Cc: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Guys, please take a look once time permit, any comments are highly appreciated! v19 (by Vlad): - do not modify box_issue_promote and demote (while they are still simply utter code duplication but whatever) - make txn_limbo_process being void - make txn_limbo_process_begin/commit/rollback being void - the real processing of request under the lock named as txn_limbo_process_core - testcase completely reworked (kudos to SergeP) - note that if we import test to the master branch without ordering pass it will fire assertion - dropped off debug info from box.info interface v20 (by SergeP): - use guard for ACK processing and parameters change - rework test v21 (by SergeP): - drop warning from txn_limbo_ack - rework test to use cluster helpers and ERRINJ_WAL_WRITE_COUNT error injection, same time drop modification of election_replica script v22 (by SergeP): - use limbo emptiness test _after_ owner_id test - drop redundant assert in limbo commit/rollback since we're unlocking a latch anyway where own assertion present - in test: drop excessive wait_cond and setup wal delay earlier branch gorcunov/gh-6036-rollback-confirm-22 issue https://github.com/tarantool/tarantool/issues/6036 previous series https://lists.tarantool.org/tarantool-patches/20211008175809.349501-1-gorcunov@gmail.com/ Cyrill Gorcunov (3): latch: add latch_is_locked helper qsync: order access to the limbo terms test: add gh-6036-qsync-order test src/box/applier.cc | 12 +- src/box/box.cc | 15 +- src/box/relay.cc | 11 +- src/box/txn.c | 2 +- src/box/txn_limbo.c | 49 ++++- src/box/txn_limbo.h | 78 ++++++- src/lib/core/latch.h | 11 + test/replication/gh-6036-qsync-order.result | 200 ++++++++++++++++++ test/replication/gh-6036-qsync-order.test.lua | 96 +++++++++ test/replication/suite.cfg | 1 + test/replication/suite.ini | 2 +- test/unit/snap_quorum_delay.cc | 5 +- 12 files changed, 451 insertions(+), 31 deletions(-) create mode 100644 test/replication/gh-6036-qsync-order.result create mode 100644 test/replication/gh-6036-qsync-order.test.lua base-commit: ce5752ce235324fcefd5a3d0503fd3f8a1800d38 -- 2.31.1 -- Summary patch against v21 -- diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c index 176a83cb5..8f9bc11c7 100644 --- a/src/box/txn_limbo.c +++ b/src/box/txn_limbo.c @@ -546,8 +546,6 @@ void txn_limbo_ack(struct txn_limbo *limbo, uint32_t owner_id, uint32_t replica_id, int64_t lsn) { - if (rlist_empty(&limbo->queue)) - return; /* * ACKs are collected only by the transactions originator * (which is the single master in 100% so far). Other instances @@ -560,6 +558,15 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t owner_id, */ if (!txn_limbo_is_owner(limbo, owner_id)) return; + + /* + * Test for empty queue is done _after_ txn_limbo_is_owner + * call because we need to be sure that limbo is not been + * changed under our feets while we're reading it. + */ + if (rlist_empty(&limbo->queue)) + return; + /* * If limbo is currently writing a rollback, it means that the whole * queue will be rolled back. Because rollback is written only for @@ -815,8 +822,6 @@ txn_limbo_process(struct txn_limbo *limbo, void txn_limbo_on_parameters_change(struct txn_limbo *limbo) { - if (rlist_empty(&limbo->queue)) - return; /* * In case if we're not current leader (ie not owning the * limbo) then we should not confirm anything, otherwise @@ -826,6 +831,9 @@ txn_limbo_on_parameters_change(struct txn_limbo *limbo) if (!txn_limbo_is_owner(limbo, instance_id)) return; + if (rlist_empty(&limbo->queue)) + return; + struct txn_limbo_entry *e; int64_t confirm_lsn = -1; rlist_foreach_entry(e, &limbo->queue, in_queue) { diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h index aaff444e4..33cacef8f 100644 --- a/src/box/txn_limbo.h +++ b/src/box/txn_limbo.h @@ -349,7 +349,6 @@ txn_limbo_process_begin(struct txn_limbo *limbo) static inline void txn_limbo_process_commit(struct txn_limbo *limbo) { - assert(latch_is_locked(&limbo->promote_latch)); latch_unlock(&limbo->promote_latch); } @@ -357,7 +356,6 @@ txn_limbo_process_commit(struct txn_limbo *limbo) static inline void txn_limbo_process_rollback(struct txn_limbo *limbo) { - assert(latch_is_locked(&limbo->promote_latch)); latch_unlock(&limbo->promote_latch); } diff --git a/test/replication/gh-6036-qsync-order.result b/test/replication/gh-6036-qsync-order.result index eb3e808cb..464a131a4 100644 --- a/test/replication/gh-6036-qsync-order.result +++ b/test/replication/gh-6036-qsync-order.result @@ -76,10 +76,6 @@ test_run:switch("election_replica2") | --- | - true | ... -test_run:wait_cond(function() return box.space.test:get{1} ~= nil end) - | --- - | - true - | ... box.cfg({ \ replication = { \ "unix/:./election_replica2.sock", \ @@ -106,6 +102,10 @@ test_run:switch("election_replica3") write_cnt = box.error.injection.get("ERRINJ_WAL_WRITE_COUNT") | --- | ... +box.error.injection.set("ERRINJ_WAL_DELAY", true) + | --- + | - ok + | ... -- -- Make election_replica2 been a leader and start writting data, -- the PROMOTE request get queued on election_replica3 and not @@ -128,10 +128,6 @@ test_run:wait_cond(function() return box.error.injection.get("ERRINJ_WAL_WRITE_C | --- | - true | ... -box.error.injection.set("ERRINJ_WAL_DELAY", true) - | --- - | - ok - | ... test_run:switch("election_replica2") | --- | - true diff --git a/test/replication/gh-6036-qsync-order.test.lua b/test/replication/gh-6036-qsync-order.test.lua index b8df170b8..6350e9303 100644 --- a/test/replication/gh-6036-qsync-order.test.lua +++ b/test/replication/gh-6036-qsync-order.test.lua @@ -37,7 +37,6 @@ box.cfg({ \ -- -- Drop connection between election_replica2 and election_replica1. test_run:switch("election_replica2") -test_run:wait_cond(function() return box.space.test:get{1} ~= nil end) box.cfg({ \ replication = { \ "unix/:./election_replica2.sock", \ @@ -57,6 +56,7 @@ box.cfg({ \ -- fall into forever sleep. test_run:switch("election_replica3") write_cnt = box.error.injection.get("ERRINJ_WAL_WRITE_COUNT") +box.error.injection.set("ERRINJ_WAL_DELAY", true) -- -- Make election_replica2 been a leader and start writting data, -- the PROMOTE request get queued on election_replica3 and not @@ -68,7 +68,6 @@ test_run:switch("election_replica2") box.ctl.promote() test_run:switch("election_replica3") test_run:wait_cond(function() return box.error.injection.get("ERRINJ_WAL_WRITE_COUNT") > write_cnt end) -box.error.injection.set("ERRINJ_WAL_DELAY", true) test_run:switch("election_replica2") _ = require('fiber').create(function() box.space.test:insert{2} end)