From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id A0E3E6FC82; Thu, 30 Sep 2021 12:46:21 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A0E3E6FC82 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1632995181; bh=Wkcq/9a+/aAnS1I1QLbe8n9Uw4Q2X/gET8I9DEX46As=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=f4f4RS2TOFoVL5oGRjwA7ozPPedUGUZw1KZkw3mUabi83q3A2vGzPYxCXjZp+uJc0 qSB0hKU9X6myTbPX2uKs4IHsLiGu15TR65TNuExIrvocyKxCxmtxXMG0T9g1p5/vpL 2ypjf4a0ylHLTvHd6pONLzerf/REzvSkCGyvoGCc= Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id A34286DB0A for ; Thu, 30 Sep 2021 12:45:26 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A34286DB0A Received: by mail-lf1-f41.google.com with SMTP id x27so22673155lfa.9 for ; Thu, 30 Sep 2021 02:45:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=v0RIN3NqxqCOlzYhLkYpkRK+IZ8RnVJI5czB0yOzgPo=; b=n2cwTtHWHXkAeiFf6Awb09UBFozhaz6rQJjZCkU8mhbneQnqeADJaN6Lhky1OfCbcj aCNs52+curqMPB0FUyKWYJEXWtyHOpMzPYnTz8YnwKPjk6/2aAVcz84bf6II8IZOP/3r 1e7JcSIryI9iag/QQpjGoGyu+4oHTMewfNWa/8lNH7LwSKKQtQyAjWb/D9dhIEXFIcK3 pajnjDy2iZmSX4Zoeu603H09/PkhGI/jCg8qM+y4FAkPrJNr7CN6NH5p+TCCydRk3ZHJ Yp7caTBt5KozbKZc9osFXPlaRXe20uhkVisPErtmF72TYcVFiQr0XPQaXq5bX2Tnz0jf hDhg== X-Gm-Message-State: AOAM532kBrjMRwwTBrAccEymNPIwdboZ3J+vm5c9jsol+4ukTr0c+GSz KWj3y7wYKy2/zB27Ptj5HljE3EH1G9KoUQ== X-Google-Smtp-Source: ABdhPJwv0CgdKTiTNZSrnckilaZreYQNNUJ9nw7n7qE9MzAi3rI1UGbr3NugkP8P5vISKQQOkLk64A== X-Received: by 2002:a05:651c:213:: with SMTP id y19mr4822002ljn.273.1632995124556; Thu, 30 Sep 2021 02:45:24 -0700 (PDT) Received: from grain.localdomain ([5.18.253.97]) by smtp.gmail.com with ESMTPSA id d30sm154642lfj.238.2021.09.30.02.45.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Sep 2021 02:45:23 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id 6BAE75A0021; Thu, 30 Sep 2021 12:44:46 +0300 (MSK) To: tml Date: Thu, 30 Sep 2021 12:44:45 +0300 Message-Id: <20210930094445.316694-4-gorcunov@gmail.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210930094445.316694-1-gorcunov@gmail.com> References: <20210930094445.316694-1-gorcunov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH v19 3/3] test: add gh-6036-qsync-order test X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Cc: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" To test that promotion requests are handled only when appropriate write to WAL completes, because we update memory data before the write finishes. Note that without the patch this test fires assertion > tarantool: src/box/txn_limbo.c:481: txn_limbo_read_rollback: Assertion `e->txn->signature >= 0' failed. Part-of #6036 Signed-off-by: Cyrill Gorcunov --- test/replication/gh-6036-qsync-master.lua | 1 + test/replication/gh-6036-qsync-node.lua | 35 +++ test/replication/gh-6036-qsync-order.result | 207 ++++++++++++++++++ test/replication/gh-6036-qsync-order.test.lua | 95 ++++++++ test/replication/gh-6036-qsync-replica1.lua | 1 + test/replication/gh-6036-qsync-replica2.lua | 1 + test/replication/suite.cfg | 1 + test/replication/suite.ini | 2 +- 8 files changed, 342 insertions(+), 1 deletion(-) create mode 120000 test/replication/gh-6036-qsync-master.lua create mode 100644 test/replication/gh-6036-qsync-node.lua create mode 100644 test/replication/gh-6036-qsync-order.result create mode 100644 test/replication/gh-6036-qsync-order.test.lua create mode 120000 test/replication/gh-6036-qsync-replica1.lua create mode 120000 test/replication/gh-6036-qsync-replica2.lua diff --git a/test/replication/gh-6036-qsync-master.lua b/test/replication/gh-6036-qsync-master.lua new file mode 120000 index 000000000..87bdb46ef --- /dev/null +++ b/test/replication/gh-6036-qsync-master.lua @@ -0,0 +1 @@ +gh-6036-qsync-node.lua \ No newline at end of file diff --git a/test/replication/gh-6036-qsync-node.lua b/test/replication/gh-6036-qsync-node.lua new file mode 100644 index 000000000..ba6213255 --- /dev/null +++ b/test/replication/gh-6036-qsync-node.lua @@ -0,0 +1,35 @@ +local INSTANCE_ID = string.match(arg[0], "gh%-6036%-qsync%-(.+)%.lua") + +local function unix_socket(name) + return "unix/:./" .. name .. '.sock'; +end + +require('console').listen(os.getenv('ADMIN')) + +local box_cfg_common = { + listen = unix_socket(INSTANCE_ID), + replication = { + unix_socket("master"), + unix_socket("replica1"), + unix_socket("replica2"), + }, + replication_connect_quorum = 1, + replication_synchro_quorum = 1, + replication_synchro_timeout = 10000, +} + +if INSTANCE_ID == "master" then + box_cfg_common['election_mode'] = "manual" + box.cfg(box_cfg_common) +elseif INSTANCE_ID == "replica1" then + box_cfg_common['election_mode'] = "manual" + box.cfg(box_cfg_common) +else + assert(INSTANCE_ID == "replica2") + box_cfg_common['election_mode'] = "manual" + box.cfg(box_cfg_common) +end + +box.once("bootstrap", function() + box.schema.user.grant('guest', 'super') +end) diff --git a/test/replication/gh-6036-qsync-order.result b/test/replication/gh-6036-qsync-order.result new file mode 100644 index 000000000..378ce917a --- /dev/null +++ b/test/replication/gh-6036-qsync-order.result @@ -0,0 +1,207 @@ +-- test-run result file version 2 +-- +-- gh-6036: verify that terms are locked when we're inside journal +-- write routine, because parallel appliers may ignore the fact that +-- the term is updated already but not yet written leading to data +-- inconsistency. +-- +test_run = require('test_run').new() + | --- + | ... + +test_run:cmd('create server master with script="replication/gh-6036-qsync-master.lua"') + | --- + | - true + | ... +test_run:cmd('create server replica1 with script="replication/gh-6036-qsync-replica1.lua"') + | --- + | - true + | ... +test_run:cmd('create server replica2 with script="replication/gh-6036-qsync-replica2.lua"') + | --- + | - true + | ... + +test_run:cmd('start server master with wait=False') + | --- + | - true + | ... +test_run:cmd('start server replica1 with wait=False') + | --- + | - true + | ... +test_run:cmd('start server replica2 with wait=False') + | --- + | - true + | ... + +test_run:wait_fullmesh({"master", "replica1", "replica2"}) + | --- + | ... + +-- +-- Create a synchro space on the master node and make +-- sure the write processed just fine. +test_run:switch("master") + | --- + | - true + | ... +box.ctl.promote() + | --- + | ... +s = box.schema.create_space('test', {is_sync = true}) + | --- + | ... +_ = s:create_index('pk') + | --- + | ... +s:insert{1} + | --- + | - [1] + | ... +test_run:switch("replica1") + | --- + | - true + | ... +test_run:wait_cond(function() return box.space.test:get{1} ~= nil end) + | --- + | - true + | ... +test_run:switch("replica2") + | --- + | - true + | ... +test_run:wait_cond(function() return box.space.test:get{1} ~= nil end) + | --- + | - true + | ... + +-- +-- Drop connection between master and replica1. +test_run:switch("master") + | --- + | - true + | ... +box.cfg({ \ + replication = { \ + "unix/:./master.sock", \ + "unix/:./replica2.sock", \ + }, \ +}) + | --- + | ... +-- +-- Drop connection between replica1 and master. +test_run:switch("replica1") + | --- + | - true + | ... +test_run:wait_cond(function() return box.space.test:get{1} ~= nil end) + | --- + | - true + | ... +box.cfg({ \ + replication = { \ + "unix/:./replica1.sock", \ + "unix/:./replica2.sock", \ + }, \ +}) + | --- + | ... + +-- +-- Here we have the following scheme +-- +-- replica2 (will be delayed) +-- / \ +-- master replica1 + +-- +-- Initiate disk delay and remember the max term seen so far. +test_run:switch("replica2") + | --- + | - true + | ... +box.error.injection.set('ERRINJ_WAL_DELAY', true) + | --- + | - ok + | ... + +-- +-- Make replica1 been a leader and start writting data, +-- the PROMOTE request get queued on replica2 and not +-- yet processed, same time INSERT won't complete either +-- waiting for PROMOTE completion first. +test_run:switch("replica1") + | --- + | - true + | ... +box.ctl.promote() + | --- + | ... +_ = require('fiber').create(function() box.space.test:insert{2} end) + | --- + | ... + +-- +-- The master node has no clue that there is a new leader +-- and continue writting data with obsolete term. Since replica2 +-- is delayed now the INSERT won't proceed yet but get queued. +test_run:switch("master") + | --- + | - true + | ... +_ = require('fiber').create(function() box.space.test:insert{3} end) + | --- + | ... + +-- +-- Finally enable replica2 back. Make sure the data from new replica1 +-- leader get writting while old leader's data ignored. +test_run:switch("replica2") + | --- + | - true + | ... +box.error.injection.set('ERRINJ_WAL_DELAY', false) + | --- + | - ok + | ... +test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) + | --- + | - true + | ... +box.space.test:select{} + | --- + | - - [1] + | - [2] + | ... + +test_run:switch("default") + | --- + | - true + | ... +test_run:cmd('stop server master') + | --- + | - true + | ... +test_run:cmd('stop server replica1') + | --- + | - true + | ... +test_run:cmd('stop server replica2') + | --- + | - true + | ... + +test_run:cmd('delete server master') + | --- + | - true + | ... +test_run:cmd('delete server replica1') + | --- + | - true + | ... +test_run:cmd('delete server replica2') + | --- + | - true + | ... diff --git a/test/replication/gh-6036-qsync-order.test.lua b/test/replication/gh-6036-qsync-order.test.lua new file mode 100644 index 000000000..ac1dc5d91 --- /dev/null +++ b/test/replication/gh-6036-qsync-order.test.lua @@ -0,0 +1,95 @@ +-- +-- gh-6036: verify that terms are locked when we're inside journal +-- write routine, because parallel appliers may ignore the fact that +-- the term is updated already but not yet written leading to data +-- inconsistency. +-- +test_run = require('test_run').new() + +test_run:cmd('create server master with script="replication/gh-6036-qsync-master.lua"') +test_run:cmd('create server replica1 with script="replication/gh-6036-qsync-replica1.lua"') +test_run:cmd('create server replica2 with script="replication/gh-6036-qsync-replica2.lua"') + +test_run:cmd('start server master with wait=False') +test_run:cmd('start server replica1 with wait=False') +test_run:cmd('start server replica2 with wait=False') + +test_run:wait_fullmesh({"master", "replica1", "replica2"}) + +-- +-- Create a synchro space on the master node and make +-- sure the write processed just fine. +test_run:switch("master") +box.ctl.promote() +s = box.schema.create_space('test', {is_sync = true}) +_ = s:create_index('pk') +s:insert{1} +test_run:switch("replica1") +test_run:wait_cond(function() return box.space.test:get{1} ~= nil end) +test_run:switch("replica2") +test_run:wait_cond(function() return box.space.test:get{1} ~= nil end) + +-- +-- Drop connection between master and replica1. +test_run:switch("master") +box.cfg({ \ + replication = { \ + "unix/:./master.sock", \ + "unix/:./replica2.sock", \ + }, \ +}) +-- +-- Drop connection between replica1 and master. +test_run:switch("replica1") +test_run:wait_cond(function() return box.space.test:get{1} ~= nil end) +box.cfg({ \ + replication = { \ + "unix/:./replica1.sock", \ + "unix/:./replica2.sock", \ + }, \ +}) + +-- +-- Here we have the following scheme +-- +-- replica2 (will be delayed) +-- / \ +-- master replica1 + +-- +-- Initiate disk delay and remember the max term seen so far. +test_run:switch("replica2") +box.error.injection.set('ERRINJ_WAL_DELAY', true) + +-- +-- Make replica1 been a leader and start writting data, +-- the PROMOTE request get queued on replica2 and not +-- yet processed, same time INSERT won't complete either +-- waiting for PROMOTE completion first. +test_run:switch("replica1") +box.ctl.promote() +_ = require('fiber').create(function() box.space.test:insert{2} end) + +-- +-- The master node has no clue that there is a new leader +-- and continue writting data with obsolete term. Since replica2 +-- is delayed now the INSERT won't proceed yet but get queued. +test_run:switch("master") +_ = require('fiber').create(function() box.space.test:insert{3} end) + +-- +-- Finally enable replica2 back. Make sure the data from new replica1 +-- leader get writting while old leader's data ignored. +test_run:switch("replica2") +box.error.injection.set('ERRINJ_WAL_DELAY', false) +test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) +box.space.test:select{} + +test_run:switch("default") +test_run:cmd('stop server master') +test_run:cmd('stop server replica1') +test_run:cmd('stop server replica2') + +test_run:cmd('delete server master') +test_run:cmd('delete server replica1') +test_run:cmd('delete server replica2') diff --git a/test/replication/gh-6036-qsync-replica1.lua b/test/replication/gh-6036-qsync-replica1.lua new file mode 120000 index 000000000..87bdb46ef --- /dev/null +++ b/test/replication/gh-6036-qsync-replica1.lua @@ -0,0 +1 @@ +gh-6036-qsync-node.lua \ No newline at end of file diff --git a/test/replication/gh-6036-qsync-replica2.lua b/test/replication/gh-6036-qsync-replica2.lua new file mode 120000 index 000000000..87bdb46ef --- /dev/null +++ b/test/replication/gh-6036-qsync-replica2.lua @@ -0,0 +1 @@ +gh-6036-qsync-node.lua \ No newline at end of file diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 3eee0803c..ed09b2087 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -59,6 +59,7 @@ "gh-6094-rs-uuid-mismatch.test.lua": {}, "gh-6127-election-join-new.test.lua": {}, "gh-6035-applier-filter.test.lua": {}, + "gh-6036-qsync-order.test.lua": {}, "election-candidate-promote.test.lua": {}, "*": { "memtx": {"engine": "memtx"}, diff --git a/test/replication/suite.ini b/test/replication/suite.ini index 77eb95f49..080e4fbf4 100644 --- a/test/replication/suite.ini +++ b/test/replication/suite.ini @@ -3,7 +3,7 @@ core = tarantool script = master.lua description = tarantool/box, replication disabled = consistent.test.lua -release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua +release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua gh-6036-qsync-order.test.lua config = suite.cfg lua_libs = lua/fast_replica.lua lua/rlimit.lua use_unix_sockets = True -- 2.31.1