From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 4D81B6EC40; Mon, 20 Sep 2021 22:59:24 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 4D81B6EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1632167964; bh=ykq9+Oia0am4C1El6/ZNA/oyxBu7G4IaWcXsLEJXT0M=; h=To:Cc:Date:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=X7p8cNdfMwP4o0GzlmW884qwyciJVtryAnQeKUaAPJmNrM2HKAXoed3PkGpFC4oeq CgaPydXEpEE0IgabDTllE0vh3RG5Lh4HyX2nTW6yjnv/KXrA7J16tICuLWrfpSMlG5 F4sH5a9dTjwLa8I/fenan8h3a3wBl5tjP30emkvA= Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E00066EC40 for ; Mon, 20 Sep 2021 22:59:22 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org E00066EC40 Received: by mail-lf1-f50.google.com with SMTP id b20so13727284lfv.3 for ; Mon, 20 Sep 2021 12:59:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=XazQ/SwbpGAyuwQnt9odv7LZKqcmaYqWwl0h5+ZgAZk=; b=jL0IfDwVkINJNegbiMG7lbHOERzPIpF0Ae6d9JNY5VC4sLSAABO2JVcbr/e+/BMk1Q Boy2czkiwJw9TJuCgry9/MiGgt4ZnTUXb9li/dUY3dm/ir6vg/BlcUvaWkpiwevSzSZd UUu3OCD9GKQHypgD7TM4IP5o1l7kMK3ltb0z3qskHyY4bW4slKqcvnMbSEKQV8F3i/B9 +xiPC18pVxTZ4wQEUj9BUV0SnXUkd7tTIevUm6fLfbYXcMWBwauaEeRDOgcOJ70u1Nmd CNVc6cJMyoeMmoKlYh2MxgFnxYPn4XveO1J0zPYHhdWnFkD/bgwOnpoyhAHducQRckd3 hNEw== X-Gm-Message-State: AOAM532LJ9LQZZMMaE5xhwEOCGSOx6ObonpcOleU59FV35R32SMIjN6s CW+bswKIbPmAYJonP9tzQkY= X-Google-Smtp-Source: ABdhPJxWIclKkllh29EIwzZrkb9Gm8TzccjkzrPZQ3gIsNcrDjCTaHoJuO6r+F62jaqlrXtZdAg9Zw== X-Received: by 2002:a2e:89d2:: with SMTP id c18mr24921067ljk.242.1632167960755; Mon, 20 Sep 2021 12:59:20 -0700 (PDT) Received: from localhost.localdomain (broadband-46-242-13-79.ip.moscow.rt.ru. [46.242.13.79]) by smtp.gmail.com with ESMTPSA id b28sm1809915ljf.101.2021.09.20.12.59.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Sep 2021 12:59:20 -0700 (PDT) To: sergepetrenko@tarantool.org, shemeneval@gmail.com Cc: tarantool-patches@dev.tarantool.org, Yan Shtunder Date: Mon, 20 Sep 2021 22:59:15 +0300 Message-Id: <20210920195915.35194-1-ya.shtunder@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH] replication: removing anonymous replicas from synchro quorum X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Yan Shtunder via Tarantool-patches Reply-To: Yan Shtunder Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Transactions have to committed after they reaches quorum of "real" cluster members. Therefore, anonymous replicas don't have to participate in the quorum. Closes #5418 --- Issue: https://github.com/tarantool/tarantool/issues/5418 Patch: https://github.com/tarantool/tarantool/tree/yshtunder/gh-5418-qsync-with-anon-replicas src/box/relay.cc | 3 +- test/replication-luatest/gh_5418_test.lua | 89 +++++++++ .../instance_files/master.lua | 18 ++ .../instance_files/replica.lua | 18 ++ test/replication-luatest/suite.ini | 3 + test/replication/qsync_with_anon.result | 187 ++---------------- test/replication/qsync_with_anon.test.lua | 83 ++------ 7 files changed, 164 insertions(+), 237 deletions(-) create mode 100644 test/replication-luatest/gh_5418_test.lua create mode 100755 test/replication-luatest/instance_files/master.lua create mode 100755 test/replication-luatest/instance_files/replica.lua create mode 100644 test/replication-luatest/suite.ini diff --git a/src/box/relay.cc b/src/box/relay.cc index 115037fc3..4d5f9a625 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -515,6 +515,7 @@ tx_status_update(struct cmsg *msg) struct replication_ack ack; ack.source = status->relay->replica->id; ack.vclock = &status->vclock; + bool anon = status->relay->replica->anon; /* * Let pending synchronous transactions know, which of * them were successfully sent to the replica. Acks are @@ -522,7 +523,7 @@ tx_status_update(struct cmsg *msg) * the single master in 100% so far). Other instances wait * for master's CONFIRM message instead. */ - if (txn_limbo.owner_id == instance_id) { + if (txn_limbo.owner_id == instance_id && !anon) { txn_limbo_ack(&txn_limbo, ack.source, vclock_get(ack.vclock, instance_id)); } diff --git a/test/replication-luatest/gh_5418_test.lua b/test/replication-luatest/gh_5418_test.lua new file mode 100644 index 000000000..77460165c --- /dev/null +++ b/test/replication-luatest/gh_5418_test.lua @@ -0,0 +1,89 @@ +local t = require('luatest') +local log = require('log') +local clock = require('clock') + + +local g = t.group() +local fio = require('fio') + +local Server = t.Server + +g.before_all(function() + g.master = Server:new({ + alias = 'master', + command = './test/replication-luatest/instance_files/master.lua', + workdir = fio.tempdir(), + env = {MASTER = 'localhost:13301'}, + http_port = 8081, + net_box_port = 13301, + }) + + g.replica = Server:new({ + alias = 'replica', + command = './test/replication-luatest/instance_files/replica.lua', + workdir = fio.tempdir(), + env = {MASTER = 'localhost:13301', LISTEN = 'localhost:13302'}, + http_port = 8082, + net_box_port = 13302, + }) + + + g.master:start() + g.replica:start() + + t.helpers.retrying({}, function() g.master:connect_net_box() end) + t.helpers.retrying({}, function() g.replica:connect_net_box() end) + + log.info('Everything is started') +end) + + +g.after_all(function() + g.replica:stop() + g.master:stop() + fio.rmtree(g.master.workdir) + fio.rmtree(g.replica.workdir) +end) + + +local TIMEOUT = 1 + + +local function wait_vclock() + local err = false + local lsn = g.master:eval("return box.info.vclock[1]") + local _, tbl = g.master:eval("return next(box.info.replication_anon())") + local to_lsn = tbl.downstream.vclock[1] + local started_at = clock.time() + + while to_lsn == nil or to_lsn < lsn do + require('fiber').sleep(0.001) + _, tbl = g.master:eval("return next(box.info.replication_anon())") + to_lsn = tbl.downstream.vclock[1] + + if (clock.time() - started_at) > TIMEOUT then + err = true + return err + end + + log.info(string.format("master lsn: %d; replica_anon lsn: %d", + lsn, to_lsn)) + end + + return err +end + + +g.test_qsync_with_anon = function() + g.master:eval("box.schema.space.create('sync', {is_sync = true})") + g.master:eval("box.space.sync:create_index('pk')") + + t.assert_error_msg_content_equals("Quorum collection for a synchronous transaction is timed out", + function() g.master:eval("return box.space.sync:insert{1}") end) + + -- Wait until everything is replicated from the master to the replica + t.assert_equals(wait_vclock(), false) + + t.assert_equals(g.master:eval("return box.space.sync:select()"), {}) + t.assert_equals(g.replica:eval("return box.space.sync:select()"), {}) +end diff --git a/test/replication-luatest/instance_files/master.lua b/test/replication-luatest/instance_files/master.lua new file mode 100755 index 000000000..41e8f1749 --- /dev/null +++ b/test/replication-luatest/instance_files/master.lua @@ -0,0 +1,18 @@ +#!/usr/bin/env tarantool + +local function instance_uri(instance_id) + return 'localhost:'..(13300 + instance_id) +end + +box.cfg({ + --log_level = 7, + work_dir = os.getenv('TARANTOOL_WORKDIR'), + listen = os.getenv('TARANTOOL_LISTEN'), + replication = {os.getenv("MASTER")}, + memtx_memory = 107374182, + replication_synchro_quorum = 2, + replication_timeout = 0.1 +}) + +box.schema.user.grant('guest', 'read, write, execute, create', 'universe', nil, {if_not_exists=true}) +require('log').warn("master is ready") diff --git a/test/replication-luatest/instance_files/replica.lua b/test/replication-luatest/instance_files/replica.lua new file mode 100755 index 000000000..1d791dfa4 --- /dev/null +++ b/test/replication-luatest/instance_files/replica.lua @@ -0,0 +1,18 @@ +#!/usr/bin/env tarantool + +local function instance_uri(instance_id) + return 'localhost:'..(13300 + instance_id) +end + +box.cfg({ + work_dir = os.getenv('TARANTOOL_WORKDIR'), + listen = os.getenv('TARANTOOL_LISTEN'), + replication = {os.getenv("MASTER"), os.getenv("LISTEN")}, + memtx_memory = 107374182, + replication_timeout = 0.1, + replication_connect_timeout = 0.5, + read_only = true, + replication_anon = true +}) + +require('log').warn("replica is ready") diff --git a/test/replication-luatest/suite.ini b/test/replication-luatest/suite.ini new file mode 100644 index 000000000..ccd624099 --- /dev/null +++ b/test/replication-luatest/suite.ini @@ -0,0 +1,3 @@ +[default] +core = luatest +description = first try of using luatest diff --git a/test/replication/qsync_with_anon.result b/test/replication/qsync_with_anon.result index 6a2952a32..d847a77aa 100644 --- a/test/replication/qsync_with_anon.result +++ b/test/replication/qsync_with_anon.result @@ -1,195 +1,57 @@ -- test-run result file version 2 -env = require('test_run') - | --- - | ... -test_run = env.new() - | --- - | ... -engine = test_run:get_cfg('engine') - | --- - | ... - -orig_synchro_quorum = box.cfg.replication_synchro_quorum - | --- - | ... -orig_synchro_timeout = box.cfg.replication_synchro_timeout +test_run = require('test_run').new() | --- | ... NUM_INSTANCES = 2 | --- | ... -BROKEN_QUORUM = NUM_INSTANCES + 1 - | --- - | ... box.schema.user.grant('guest', 'replication') | --- | ... - --- Setup a cluster with anonymous replica. -test_run:cmd('create server replica_anon with rpl_master=default, script="replication/anon1.lua"') - | --- - | - true - | ... -test_run:cmd('start server replica_anon') - | --- - | - true - | ... -test_run:cmd('switch replica_anon') - | --- - | - true - | ... - --- [RFC, Asynchronous replication] successful transaction applied on async --- replica. --- Testcase setup. -test_run:switch('default') - | --- - | - true - | ... -box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000} - | --- - | ... -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) - | --- - | ... -_ = box.space.sync:create_index('pk') - | --- - | ... --- Testcase body. -test_run:switch('default') - | --- - | - true - | ... -box.space.sync:insert{1} -- success - | --- - | - [1] - | ... -box.space.sync:insert{2} -- success - | --- - | - [2] - | ... -box.space.sync:insert{3} -- success - | --- - | - [3] - | ... -test_run:cmd('switch replica_anon') - | --- - | - true - | ... -box.space.sync:select{} -- 1, 2, 3 - | --- - | - - [1] - | - [2] - | - [3] - | ... --- Testcase cleanup. -test_run:switch('default') - | --- - | - true - | ... -box.space.sync:drop() - | --- - | ... - --- [RFC, Asynchronous replication] failed transaction rolled back on async --- replica. --- Testcase setup. -box.cfg{replication_synchro_quorum = NUM_INSTANCES, replication_synchro_timeout = 1000} +box.cfg{replication_synchro_quorum=NUM_INSTANCES} | --- | ... -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) +_ = box.schema.space.create('sync', {is_sync=true}) | --- | ... _ = box.space.sync:create_index('pk') | --- | ... --- Write something to flush the current master's state to replica. -_ = box.space.sync:insert{1} - | --- - | ... -_ = box.space.sync:delete{1} - | --- - | ... -box.cfg{replication_synchro_quorum = BROKEN_QUORUM, replication_synchro_timeout = 1000} - | --- - | ... -fiber = require('fiber') - | --- - | ... -ok, err = nil - | --- - | ... -f = fiber.create(function() \ - ok, err = pcall(box.space.sync.insert, box.space.sync, {1}) \ -end) - | --- - | ... -test_run:cmd('switch replica_anon') +-- Setup a cluster with anonymous replica +test_run:cmd('create server replica_anon with rpl_master=default,\ + script="replication/anon1.lua"') | --- | - true | ... -test_run:wait_cond(function() return box.space.sync:count() == 1 end) +test_run:cmd('start server replica_anon') | --- | - true | ... -box.space.sync:select{} - | --- - | - - [1] - | ... -test_run:switch('default') - | --- - | - true - | ... -box.cfg{replication_synchro_timeout = 0.001} - | --- - | ... -test_run:wait_cond(function() return f:status() == 'dead' end) + +-- Testcase +box.space.sync:insert{1} -- error | --- - | - true + | - error: Quorum collection for a synchronous transaction is timed out | ... -box.space.sync:select{} +box.space.sync:insert{3} -- error | --- - | - [] + | - error: Quorum collection for a synchronous transaction is timed out | ... - test_run:cmd('switch replica_anon') | --- | - true | ... -test_run:wait_cond(function() return box.space.sync:count() == 0 end) - | --- - | - true - | ... -box.space.sync:select{} +box.space.sync:select{} -- [] | --- | - [] | ... -test_run:switch('default') - | --- - | - true - | ... -box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000} - | --- - | ... -box.space.sync:insert{1} -- success - | --- - | - [1] - | ... -test_run:cmd('switch replica_anon') - | --- - | - true - | ... -box.space.sync:select{} -- 1 - | --- - | - - [1] - | ... --- Testcase cleanup. +-- Testcase cleanup test_run:switch('default') | --- | - true @@ -198,12 +60,13 @@ box.space.sync:drop() | --- | ... --- Teardown. -test_run:switch('default') + +-- Teardown +test_run:cmd('stop server replica_anon') | --- | - true | ... -test_run:cmd('stop server replica_anon') +test_run:cmd('cleanup server replica_anon') | --- | - true | ... @@ -211,15 +74,3 @@ test_run:cmd('delete server replica_anon') | --- | - true | ... -box.schema.user.revoke('guest', 'replication') - | --- - | ... -box.cfg{ \ - replication_synchro_quorum = orig_synchro_quorum, \ - replication_synchro_timeout = orig_synchro_timeout, \ -} - | --- - | ... -test_run:cleanup_cluster() - | --- - | ... diff --git a/test/replication/qsync_with_anon.test.lua b/test/replication/qsync_with_anon.test.lua index d7ecaa107..2d92f08aa 100644 --- a/test/replication/qsync_with_anon.test.lua +++ b/test/replication/qsync_with_anon.test.lua @@ -1,84 +1,31 @@ -env = require('test_run') -test_run = env.new() -engine = test_run:get_cfg('engine') - -orig_synchro_quorum = box.cfg.replication_synchro_quorum -orig_synchro_timeout = box.cfg.replication_synchro_timeout +test_run = require('test_run').new() NUM_INSTANCES = 2 -BROKEN_QUORUM = NUM_INSTANCES + 1 box.schema.user.grant('guest', 'replication') - --- Setup a cluster with anonymous replica. -test_run:cmd('create server replica_anon with rpl_master=default, script="replication/anon1.lua"') -test_run:cmd('start server replica_anon') -test_run:cmd('switch replica_anon') - --- [RFC, Asynchronous replication] successful transaction applied on async --- replica. --- Testcase setup. -test_run:switch('default') -box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000} -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) +box.cfg{replication_synchro_quorum=NUM_INSTANCES} +_ = box.schema.space.create('sync', {is_sync=true}) _ = box.space.sync:create_index('pk') --- Testcase body. -test_run:switch('default') -box.space.sync:insert{1} -- success -box.space.sync:insert{2} -- success -box.space.sync:insert{3} -- success -test_run:cmd('switch replica_anon') -box.space.sync:select{} -- 1, 2, 3 --- Testcase cleanup. -test_run:switch('default') -box.space.sync:drop() --- [RFC, Asynchronous replication] failed transaction rolled back on async --- replica. --- Testcase setup. -box.cfg{replication_synchro_quorum = NUM_INSTANCES, replication_synchro_timeout = 1000} -_ = box.schema.space.create('sync', {is_sync=true, engine=engine}) -_ = box.space.sync:create_index('pk') --- Write something to flush the current master's state to replica. -_ = box.space.sync:insert{1} -_ = box.space.sync:delete{1} - -box.cfg{replication_synchro_quorum = BROKEN_QUORUM, replication_synchro_timeout = 1000} -fiber = require('fiber') -ok, err = nil -f = fiber.create(function() \ - ok, err = pcall(box.space.sync.insert, box.space.sync, {1}) \ -end) -test_run:cmd('switch replica_anon') -test_run:wait_cond(function() return box.space.sync:count() == 1 end) -box.space.sync:select{} +-- Setup a cluster with anonymous replica +test_run:cmd('create server replica_anon with rpl_master=default,\ + script="replication/anon1.lua"') +test_run:cmd('start server replica_anon') -test_run:switch('default') -box.cfg{replication_synchro_timeout = 0.001} -test_run:wait_cond(function() return f:status() == 'dead' end) -box.space.sync:select{} +-- Testcase +box.space.sync:insert{1} -- error +box.space.sync:insert{3} -- error test_run:cmd('switch replica_anon') -test_run:wait_cond(function() return box.space.sync:count() == 0 end) -box.space.sync:select{} +box.space.sync:select{} -- [] -test_run:switch('default') -box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000} -box.space.sync:insert{1} -- success -test_run:cmd('switch replica_anon') -box.space.sync:select{} -- 1 --- Testcase cleanup. +-- Testcase cleanup test_run:switch('default') box.space.sync:drop() --- Teardown. -test_run:switch('default') + +-- Teardown test_run:cmd('stop server replica_anon') +test_run:cmd('cleanup server replica_anon') test_run:cmd('delete server replica_anon') -box.schema.user.revoke('guest', 'replication') -box.cfg{ \ - replication_synchro_quorum = orig_synchro_quorum, \ - replication_synchro_timeout = orig_synchro_timeout, \ -} -test_run:cleanup_cluster() -- 2.25.1