From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 5260D74151; Wed, 6 Oct 2021 10:06:40 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 5260D74151 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1633504000; bh=hBy8KW5+QEtmeUV/Scj6KNrcDVIwc4aXxcTsQ35GKg0=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=AB79eDEWMqgBw2e0sBG9LKbi0fsnxMxY8un0aoxVLT4VBNrHlu3qLQKcjROSQjuqn h60q+4qk8FknYXxQtYTNlu3nXt02co1mSho/Hs2N08LHCjCtJ0nmf3rKqp9kVzMjr2 2xUlRxMtEWO1aoD9TeHk7jBqWdQH++9FkF+NJSFY= Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id B9C846F3D0 for ; Wed, 6 Oct 2021 10:06:38 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B9C846F3D0 Received: by mail-lf1-f48.google.com with SMTP id y26so6163467lfa.11 for ; Wed, 06 Oct 2021 00:06:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=lQtnWLVgysUZRYSVOEqa1V3ZFhpZf8BYKtQ9WdP9tYY=; b=WUWLj8gaoBGu4wWf9xWqcwZsaBkHlFKToDLmKY3rMWLg4eUE5C3KIXV8HUroKrCeBx juaWbG2iHXT5JIZRYvmUK8j/SzPcPVygFNkura7UNN4xW5J9kiLWZ3xCa9Bz1CR1/0Km B86jKZxsCUBvWkVEFwYfD0lK9i47cJ8uzoTS4oux+NZRI8Yrut0+Q3lFUZy80IGUWW41 2ibocsAM2gW9yMIHqDj8hHpznO8csad3tPcLICkeP15njOO6MwQnich10TPMDFEEWPWR b0p85vcv+IizRvMxkQo23I0mM5Y5x65+H3ci4cJXviQJQrHfxBEM3nAzf5mJF8Xr/jiM Ym0g== X-Gm-Message-State: AOAM531ukjgf+/XEi+7XgCdTsAOo4+CPfUMln8tweGmt3HVqG8SayULR NVlEy/r94Qf9IRz2FhC6fjGTtGa73WA= X-Google-Smtp-Source: ABdhPJxD1NF9NSgfWJ9NsxCjRcl+muobzNyPGTY49hj1wp3gM4eT2xobRDbUfAtYlg9yMnEYBdhecg== X-Received: by 2002:a05:6512:3b08:: with SMTP id f8mr8086236lfv.88.1633503997504; Wed, 06 Oct 2021 00:06:37 -0700 (PDT) Received: from grain.localdomain ([5.18.253.97]) by smtp.gmail.com with ESMTPSA id r16sm2063695ljp.91.2021.10.06.00.06.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Oct 2021 00:06:36 -0700 (PDT) Received: by grain.localdomain (Postfix, from userid 1000) id 767715A0020; Wed, 6 Oct 2021 10:06:35 +0300 (MSK) Date: Wed, 6 Oct 2021 10:06:35 +0300 To: Serge Petrenko Cc: tml , Vladislav Shpilevoy Message-ID: References: <20210930094445.316694-1-gorcunov@gmail.com> <20210930094445.316694-4-gorcunov@gmail.com> <42713720-689c-221c-29a4-7087ccbc472f@tarantool.org> <1c992b36-7ad4-c68b-5252-69a6f1a7b67a@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.7 (2021-05-04) Subject: Re: [Tarantool-patches] [PATCH v19 3/3] test: add gh-6036-qsync-order test X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Wed, Oct 06, 2021 at 01:32:20AM +0300, Cyrill Gorcunov wrote: > > Drop it please, need to rework :/ Here is the last one, take a look please. --- >From ca074681354099a79105bee14ab804492964184a Mon Sep 17 00:00:00 2001 From: Cyrill Gorcunov Date: Mon, 20 Sep 2021 17:22:38 +0300 Subject: [PATCH] test: add gh-6036-qsync-order test To test that promotion requests are handled only when appropriate write to WAL completes, because we update memory data before the write finishes. Note that without the patch this test fires assertion > tarantool: src/box/txn_limbo.c:481: txn_limbo_read_rollback: Assertion `e->txn->signature >= 0' failed. Part-of #6036 Signed-off-by: Cyrill Gorcunov --- test/replication/election_replica.lua | 3 +- test/replication/gh-6036-qsync-order.result | 224 ++++++++++++++++++ test/replication/gh-6036-qsync-order.test.lua | 103 ++++++++ test/replication/suite.cfg | 1 + test/replication/suite.ini | 2 +- 5 files changed, 331 insertions(+), 2 deletions(-) create mode 100644 test/replication/gh-6036-qsync-order.result create mode 100644 test/replication/gh-6036-qsync-order.test.lua diff --git a/test/replication/election_replica.lua b/test/replication/election_replica.lua index 3b4d9a123..1dbfa96dc 100644 --- a/test/replication/election_replica.lua +++ b/test/replication/election_replica.lua @@ -6,6 +6,7 @@ local SYNCHRO_QUORUM = arg[1] and tonumber(arg[1]) or 3 local ELECTION_TIMEOUT = arg[2] and tonumber(arg[2]) or 0.1 local ELECTION_MODE = arg[3] or 'candidate' local CONNECT_QUORUM = arg[4] and tonumber(arg[4]) or 3 +local SYNCHRO_TIMEOUT = arg[5] and tonumber(arg[5]) or 0.1 local function instance_uri(instance_id) return SOCKET_DIR..'/election_replica'..instance_id..'.sock'; @@ -25,7 +26,7 @@ box.cfg({ election_mode = ELECTION_MODE, election_timeout = ELECTION_TIMEOUT, replication_synchro_quorum = SYNCHRO_QUORUM, - replication_synchro_timeout = 0.1, + replication_synchro_timeout = SYNCHRO_TIMEOUT, -- To reveal more election logs. log_level = 6, }) diff --git a/test/replication/gh-6036-qsync-order.result b/test/replication/gh-6036-qsync-order.result new file mode 100644 index 000000000..34a7e7803 --- /dev/null +++ b/test/replication/gh-6036-qsync-order.result @@ -0,0 +1,224 @@ +-- test-run result file version 2 +-- +-- gh-6036: verify that terms are locked when we're inside journal +-- write routine, because parallel appliers may ignore the fact that +-- the term is updated already but not yet written leading to data +-- inconsistency. +-- +test_run = require('test_run').new() + | --- + | ... + +test_run:cmd('create server master with script="replication/election_replica1.lua"') + | --- + | - true + | ... +test_run:cmd('create server replica1 with script="replication/election_replica2.lua"') + | --- + | - true + | ... +test_run:cmd('create server replica2 with script="replication/election_replica3.lua"') + | --- + | - true + | ... + +test_run:cmd("start server master with wait=False, args='1 nil manual 1 10000'") + | --- + | - true + | ... +test_run:cmd("start server replica1 with wait=False, args='1 nil manual 1 10000'") + | --- + | - true + | ... +test_run:cmd("start server replica2 with wait=False, args='1 nil manual 1 10000'") + | --- + | - true + | ... + +test_run:wait_fullmesh({"master", "replica1", "replica2"}) + | --- + | ... + +-- +-- Create a synchro space on the master node and make +-- sure the write processed just fine. +test_run:switch("master") + | --- + | - true + | ... +box.ctl.promote() + | --- + | ... +s = box.schema.create_space('test', {is_sync = true}) + | --- + | ... +_ = s:create_index('pk') + | --- + | ... +s:insert{1} + | --- + | - [1] + | ... + +test_run:switch("replica1") + | --- + | - true + | ... +test_run:wait_lsn('replica1', 'master') + | --- + | ... + +test_run:switch("replica2") + | --- + | - true + | ... +test_run:wait_lsn('replica2', 'master') + | --- + | ... + +-- +-- Drop connection between master and replica1. +test_run:switch("master") + | --- + | - true + | ... +box.cfg({ \ + replication = { \ + "unix/:./election_replica1.sock", \ + "unix/:./election_replica3.sock", \ + }, \ +}) + | --- + | ... +-- +-- Drop connection between replica1 and master. +test_run:switch("replica1") + | --- + | - true + | ... +test_run:wait_cond(function() return box.space.test:get{1} ~= nil end) + | --- + | - true + | ... +box.cfg({ \ + replication = { \ + "unix/:./election_replica2.sock", \ + "unix/:./election_replica3.sock", \ + }, \ +}) + | --- + | ... + +-- +-- Here we have the following scheme +-- +-- replica2 (will be delayed) +-- / \ +-- master replica1 + +-- +-- Initiate disk delay in a bit tricky way: the next write will +-- fall into forever sleep. +test_run:switch("replica2") + | --- + | - true + | ... +box.error.injection.set('ERRINJ_WAL_DELAY_COUNTDOWN', 1) + | --- + | - ok + | ... +cnt_before = box.error.injection.get('ERRINJ_WAL_DELAY_COUNTDOWN') + | --- + | ... +-- +-- Make replica1 been a leader and start writting data, +-- the PROMOTE request get queued on replica2 and not +-- yet processed, same time INSERT won't complete either +-- waiting for PROMOTE completion first. Note that we +-- enter replica2 as well just to be sure the PROMOTE +-- reached it. +test_run:switch("replica1") + | --- + | - true + | ... +box.ctl.promote() + | --- + | ... +test_run:switch("replica2") + | --- + | - true + | ... +test_run:wait_cond(function() return box.error.injection.get('ERRINJ_WAL_DELAY_COUNTDOWN') < cnt_before end) + | --- + | - true + | ... +test_run:switch("replica1") + | --- + | - true + | ... +_ = require('fiber').create(function() box.space.test:insert{2} end) + | --- + | ... + +-- +-- The master node has no clue that there is a new leader +-- and continue writing data with obsolete term. Since replica2 +-- is delayed now the INSERT won't proceed yet but get queued. +test_run:switch("master") + | --- + | - true + | ... +_ = require('fiber').create(function() box.space.test:insert{3} end) + | --- + | ... + +-- +-- Finally enable replica2 back. Make sure the data from new replica1 +-- leader get writing while old leader's data ignored. +test_run:switch("replica2") + | --- + | - true + | ... +box.error.injection.set('ERRINJ_WAL_DELAY', false) + | --- + | - ok + | ... +test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) + | --- + | - true + | ... +box.space.test:select{} + | --- + | - - [1] + | - [2] + | ... + +test_run:switch("default") + | --- + | - true + | ... +test_run:cmd('stop server master') + | --- + | - true + | ... +test_run:cmd('stop server replica1') + | --- + | - true + | ... +test_run:cmd('stop server replica2') + | --- + | - true + | ... + +test_run:cmd('delete server master') + | --- + | - true + | ... +test_run:cmd('delete server replica1') + | --- + | - true + | ... +test_run:cmd('delete server replica2') + | --- + | - true + | ... diff --git a/test/replication/gh-6036-qsync-order.test.lua b/test/replication/gh-6036-qsync-order.test.lua new file mode 100644 index 000000000..47996998d --- /dev/null +++ b/test/replication/gh-6036-qsync-order.test.lua @@ -0,0 +1,103 @@ +-- +-- gh-6036: verify that terms are locked when we're inside journal +-- write routine, because parallel appliers may ignore the fact that +-- the term is updated already but not yet written leading to data +-- inconsistency. +-- +test_run = require('test_run').new() + +test_run:cmd('create server master with script="replication/election_replica1.lua"') +test_run:cmd('create server replica1 with script="replication/election_replica2.lua"') +test_run:cmd('create server replica2 with script="replication/election_replica3.lua"') + +test_run:cmd("start server master with wait=False, args='1 nil manual 1 10000'") +test_run:cmd("start server replica1 with wait=False, args='1 nil manual 1 10000'") +test_run:cmd("start server replica2 with wait=False, args='1 nil manual 1 10000'") + +test_run:wait_fullmesh({"master", "replica1", "replica2"}) + +-- +-- Create a synchro space on the master node and make +-- sure the write processed just fine. +test_run:switch("master") +box.ctl.promote() +s = box.schema.create_space('test', {is_sync = true}) +_ = s:create_index('pk') +s:insert{1} + +test_run:switch("replica1") +test_run:wait_lsn('replica1', 'master') + +test_run:switch("replica2") +test_run:wait_lsn('replica2', 'master') + +-- +-- Drop connection between master and replica1. +test_run:switch("master") +box.cfg({ \ + replication = { \ + "unix/:./election_replica1.sock", \ + "unix/:./election_replica3.sock", \ + }, \ +}) +-- +-- Drop connection between replica1 and master. +test_run:switch("replica1") +test_run:wait_cond(function() return box.space.test:get{1} ~= nil end) +box.cfg({ \ + replication = { \ + "unix/:./election_replica2.sock", \ + "unix/:./election_replica3.sock", \ + }, \ +}) + +-- +-- Here we have the following scheme +-- +-- replica2 (will be delayed) +-- / \ +-- master replica1 + +-- +-- Initiate disk delay in a bit tricky way: the next write will +-- fall into forever sleep. +test_run:switch("replica2") +box.error.injection.set('ERRINJ_WAL_DELAY_COUNTDOWN', 1) +cnt_before = box.error.injection.get('ERRINJ_WAL_DELAY_COUNTDOWN') +-- +-- Make replica1 been a leader and start writting data, +-- the PROMOTE request get queued on replica2 and not +-- yet processed, same time INSERT won't complete either +-- waiting for PROMOTE completion first. Note that we +-- enter replica2 as well just to be sure the PROMOTE +-- reached it. +test_run:switch("replica1") +box.ctl.promote() +test_run:switch("replica2") +test_run:wait_cond(function() return box.error.injection.get('ERRINJ_WAL_DELAY_COUNTDOWN') < cnt_before end) +test_run:switch("replica1") +_ = require('fiber').create(function() box.space.test:insert{2} end) + +-- +-- The master node has no clue that there is a new leader +-- and continue writing data with obsolete term. Since replica2 +-- is delayed now the INSERT won't proceed yet but get queued. +test_run:switch("master") +_ = require('fiber').create(function() box.space.test:insert{3} end) + +-- +-- Finally enable replica2 back. Make sure the data from new replica1 +-- leader get writing while old leader's data ignored. +test_run:switch("replica2") +box.error.injection.set('ERRINJ_WAL_DELAY', false) +test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) +box.space.test:select{} + +test_run:switch("default") +test_run:cmd('stop server master') +test_run:cmd('stop server replica1') +test_run:cmd('stop server replica2') + +test_run:cmd('delete server master') +test_run:cmd('delete server replica1') +test_run:cmd('delete server replica2') diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 3eee0803c..ed09b2087 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -59,6 +59,7 @@ "gh-6094-rs-uuid-mismatch.test.lua": {}, "gh-6127-election-join-new.test.lua": {}, "gh-6035-applier-filter.test.lua": {}, + "gh-6036-qsync-order.test.lua": {}, "election-candidate-promote.test.lua": {}, "*": { "memtx": {"engine": "memtx"}, diff --git a/test/replication/suite.ini b/test/replication/suite.ini index 77eb95f49..080e4fbf4 100644 --- a/test/replication/suite.ini +++ b/test/replication/suite.ini @@ -3,7 +3,7 @@ core = tarantool script = master.lua description = tarantool/box, replication disabled = consistent.test.lua -release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua +release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua gh-6036-qsync-order.test.lua config = suite.cfg lua_libs = lua/fast_replica.lua lua/rlimit.lua use_unix_sockets = True -- 2.31.1