From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 54C366EC5B; Mon, 12 Apr 2021 22:23:03 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 54C366EC5B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1618255383; bh=RRcgROhXSRIReUid21RT6OhvnCrLsvXdUzrrELcFTQU=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=KtZ/q6ZgGx14eHxI8nlYFrb6AICF0A+X8zC9Ro5eFUsDSOrFbI+RiegnaGaMDYT9f /oKrj+dGbNsu3CQTOTkST8YVPfkUFTnDkLobXeSxJFOMhCMFKB14jaKbyEH42W0hAE 4COoA2d4qy5+sxXw/qpUD18WkwLvLarPFRhnLw48= Received: from smtp47.i.mail.ru (smtp47.i.mail.ru [94.100.177.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id B43886EC5B for ; Mon, 12 Apr 2021 22:23:01 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B43886EC5B Received: by smtp47.i.mail.ru with esmtpa (envelope-from ) id 1lW29F-0004xT-0N; Mon, 12 Apr 2021 22:23:01 +0300 To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: Message-ID: <56012d4a-bb15-d0e4-d948-b3a208d031f3@tarantool.org> Date: Mon, 12 Apr 2021 22:23:00 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD92FFCB8E6708E7480257C85EA0BB7A95D0F00AE41BB9A5343182A05F53808504083A5A1F0001CC82684163B69A33847478C6278F2659692D28C4BD8243C120F08 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7547428DA4700DDEFC2099A533E45F2D0395957E7521B51C2CFCAF695D4D8E9FCEA1F7E6F0F101C6778DA827A17800CE7233EAFDDCC869EB0EA1F7E6F0F101C67CDEEF6D7F21E0D1D9295C2E9FA3191EE1B59CA4C82EFA658E637ECA5D6F24F61DA39E4986F797BFBF6B57BC7E64490618DEB871D839B73339E8FC8737B5C22498424CA1AAF98A6958941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B68424CA1AAF98A6958941B15DA834481F9449624AB7ADAF37BA3038C0950A5D3613377AFFFEAFD26923F8577A6DFFEA7C8DDEAFF2085696F27B076A6E789B0E97A8DF7F3B2552694A1E7802607F20496D49FD398EE364050FB28585415E75ADA90CB8D3112395442FB3661434B16C20AC78D18283394535A9E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B682BBBAF5DF00056E089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A2368A440D3B0F6089093C9A16E5BC824A2A04A2ABAA09D25379311020FFC8D4ADF048BB164F7B76AFECA028D3BC272EF2 X-C1DE0DAB: 0D63561A33F958A5181B4B5F361F72E1528E790282836CDA2776D3810DAB9BDED59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7502E6951B79FF9A3F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34806D3522FB05EB39C699CAF5FE7D047C06874E9B95397672FA8A7A98DD0671D135F0E2A8B584BB141D7E09C32AA3244C799938D335A37A71C364AC563C9E432169B6CAE0477E908DFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojq8JA+pXcDumEIjMpNhCpyA== X-Mailru-Sender: 583F1D7ACE8F49BDD2846D59FC20E9F88795480AC44F02B844323DCBE1D1238C307B5A32C6BFE164424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 6/9] raft: keep track of greatest known term and filter replication sources based on that X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 11.04.2021 20:56, Serge Petrenko пишет: > Start writing the actual leader term together with the PROMOTE request > and process terms in PROMOTE requests on receiver side. > > Make applier only apply synchronous transactions from the instance which > has the greatest term as received in PROMOTE requests. > > Closes #5445 > --- Force-pushed a test and a couple of fixes. Please find an incremental diff below and the full patch in v2. ========================== diff --git a/src/box/box.cc b/src/box/box.cc index 9b6323b3f..aae57ec29 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1507,7 +1507,12 @@ box_clear_synchro_queue(bool try_wait)          return -1;      } -    if (!is_box_configured) +    /* +     * Do nothing when box isn't configured and when PROMOTE was already +     * written for this term. +     */ +    if (!is_box_configured || +        raft_source_term(box_raft(), instance_id) == box_raft()->term)          return 0;      uint32_t former_leader_id = txn_limbo.owner_id;      int64_t wait_lsn = txn_limbo.confirmed_lsn; @@ -1569,12 +1574,12 @@ promote:                  .replica_id = 0, /* unused */                  .origin_id = instance_id,                  .lsn = wait_lsn, -                .term = 0, /* unused */ +                .term = box_raft()->term,              };              txn_limbo_read_promote(&txn_limbo, &req);              assert(txn_limbo_is_empty(&txn_limbo));              raft_source_update_term(box_raft(), req.origin_id, -                        req.lsn); +                        req.term);          }      }      in_clear_synchro_queue = false; diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h index cba45a67d..40c8630e9 100644 --- a/src/lib/raft/raft.h +++ b/src/lib/raft/raft.h @@ -256,20 +256,29 @@ raft_is_source_allowed(const struct raft *raft, uint32_t source_id)      return !raft->is_enabled || raft->leader == source_id;  } -static inline bool -raft_source_has_outdated_term(const struct raft *raft, uint32_t source_id) +/** + * Return the latest term as seen in PROMOTE requests from instance with id + * @a source_id. + */ +static inline uint64_t +raft_source_term(const struct raft *raft, uint32_t source_id)  { -    uint64_t source_term = vclock_get(&raft->term_map, source_id); -    return raft->is_enabled && source_term < raft->greatest_known_term; +    assert(source_id != 0 && source_id < VCLOCK_MAX); +    return vclock_get(&raft->term_map, source_id);  } -/** Check if Raft is enabled. */ +/** + * Check whether replica with id @a source_id is too old to apply synchronous + * data from it. The check remains valid  even when elections are disabled. + */  static inline bool -raft_is_enabled(const struct raft *raft) +raft_source_has_outdated_term(const struct raft *raft, uint32_t source_id)  { -    return raft->is_enabled; +    uint64_t source_term = vclock_get(&raft->term_map, source_id); +    return source_term < raft->greatest_known_term;  } +/** Remember the last term seen for replica  with id @a source_id. */  static inline void  raft_source_update_term(struct raft *raft, uint32_t source_id, uint64_t term)  { @@ -280,6 +289,13 @@ raft_source_update_term(struct raft *raft, uint32_t source_id, uint64_t term)          raft->greatest_known_term = term;  } +/** Check if Raft is enabled. */ +static inline bool +raft_is_enabled(const struct raft *raft) +{ +    return raft->is_enabled; +} +  /** Process a raft entry stored in WAL/snapshot. */  void  raft_process_recovery(struct raft *raft, const struct raft_msg *req); diff --git a/test/replication/gh-5445-leader-incosistency.result b/test/replication/gh-5445-leader-incosistency.result new file mode 100644 index 000000000..b1f8a4ed1 --- /dev/null +++ b/test/replication/gh-5445-leader-incosistency.result @@ -0,0 +1,238 @@ +-- test-run result file version 2 +test_run = require("test_run").new() + | --- + | ... + +is_leader_cmd = "return box.info.election.state == 'leader'" + | --- + | ... + +-- Auxiliary. +test_run:cmd('setopt delimiter ";"') + | --- + | - true + | ... +function get_leader(nrs) +    local leader_nr = 0 +    test_run:wait_cond(function() +        for nr, do_check in pairs(nrs) do +            if do_check then +                local is_leader = test_run:eval('election_replica'..nr, +                                                is_leader_cmd)[1] +                if is_leader then +                    leader_nr = nr +                    return true +                end +            end +        end +        return false +    end) +    assert(leader_nr ~= 0) +    return leader_nr +end; + | --- + | ... + +function name(id) +    return 'election_replica'..id +end; + | --- + | ... +test_run:cmd('setopt delimiter ""'); + | --- + | - true + | ... + +-- +-- gh-5445: make sure rolled back rows do not reappear once old leader returns +-- to cluster. +-- +SERVERS = {'election_replica1', 'election_replica2' ,'election_replica3'} + | --- + | ... +test_run:create_cluster(SERVERS, "replication", {args='2 0.4'}) + | --- + | ... +test_run:wait_fullmesh(SERVERS) + | --- + | ... + +-- Any of the three instances may bootstrap the cluster and become leader. +is_possible_leader = {true, true, true} + | --- + | ... +leader_nr = get_leader(is_possible_leader) + | --- + | ... +leader = name(leader_nr) + | --- + | ... +next_leader_nr = ((leader_nr - 1) % 3 + 1) % 3 + 1 -- {1, 2, 3} -> {2, 3, 1} + | --- + | ... +next_leader = name(next_leader_nr) + | --- + | ... +other_nr = ((leader_nr - 1) % 3 + 2) % 3 + 1 -- {1, 2, 3} -> {3, 1, 2} + | --- + | ... +other = name(other_nr) + | --- + | ... + +test_run:switch(leader) + | --- + | - true + | ... +box.ctl.wait_rw() + | --- + | ... +_ = box.schema.space.create('test', {is_sync=true}) + | --- + | ... +_ = box.space.test:create_index('pk') + | --- + | ... +box.space.test:insert{1} + | --- + | - [1] + | ... + +-- Simulate a situation when the instance which will become the next leader +-- doesn't know of unconfirmed rows. It should roll them back anyways and do not +-- accept them once they actually appear from the old leader. +-- So, stop the instance which'll be the next leader. +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server '..next_leader) + | --- + | - true + | ... +test_run:switch(leader) + | --- + | - true + | ... +-- Insert some unconfirmed data. +box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=1000} + | --- + | ... +fib = require('fiber').create(box.space.test.insert, box.space.test, {2}) + | --- + | ... +fib:status() + | --- + | - suspended + | ... + +-- 'other', 'leader', 'next_leader' are defined on 'default' node, hence the +-- double switches. +test_run:switch('default') + | --- + | - true + | ... +test_run:switch(other) + | --- + | - true + | ... +-- Wait until the rows are replicated to the other instance. +test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) + | --- + | - true + | ... +box.cfg{election_mode='voter'} + | --- + | ... +-- Old leader is gone. +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server '..leader) + | --- + | - true + | ... +is_possible_leader[leader_nr] = false + | --- + | ... + +-- Emulate a situation when next_leader wins the elections. It can't do that in +-- this configuration, obviously, because it's behind the 'other' node, so set +-- quorum to 1 and imagine there are 2 more servers which would vote for +-- next_leader. +-- Also, make the instance ignore synchronization with other replicas. +-- Otherwise it would stall for replication_sync_timeout. This is due to the +-- nature of the test and may be ignored (we restart the instance to simulate +-- a situation when some rows from the old leader were not received). +test_run:cmd('start server '..next_leader..' with args="1 0.4 candidate 1"') + | --- + | - true + | ... +assert(get_leader(is_possible_leader) == next_leader_nr) + | --- + | - true + | ... +test_run:switch(other) + | --- + | - true + | ... +-- New leader didn't know about the unconfirmed rows but still rolled them back. +test_run:wait_cond(function() return box.space.test:get{2} == nil end) + | --- + | - true + | ... + +test_run:switch('default') + | --- + | - true + | ... +test_run:switch(next_leader) + | --- + | - true + | ... +box.space.test:select{} -- 1 + | --- + | - - [1] + | ... +box.cfg{election_mode='voter'} + | --- + | ... +-- Old leader returns and old unconfirmed rows from it must be ignored. +test_run:switch('default') + | --- + | - true + | ... +-- Make old leader win the elections. +test_run:cmd('start server '..leader..' with args="1 0.4 candidate 1"') + | --- + | - true + | ... +is_possible_leader[leader_nr] = true + | --- + | ... +assert(get_leader(is_possible_leader) == leader_nr) + | --- + | - true + | ... +test_run:switch(next_leader) + | --- + | - true + | ... +box.space.test:select{} -- 1 + | --- + | - - [1] + | ... +test_run:wait_upstream(1, {status='follow'}) + | --- + | - true + | ... + +-- Cleanup. +test_run:switch('default') + | --- + | - true + | ... +test_run:drop_cluster(SERVERS) + | --- + | ... diff --git a/test/replication/gh-5445-leader-incosistency.test.lua b/test/replication/gh-5445-leader-incosistency.test.lua new file mode 100644 index 000000000..94beea966 --- /dev/null +++ b/test/replication/gh-5445-leader-incosistency.test.lua @@ -0,0 +1,108 @@ +test_run = require("test_run").new() + +is_leader_cmd = "return box.info.election.state == 'leader'" + +-- Auxiliary. +test_run:cmd('setopt delimiter ";"') +function get_leader(nrs) +    local leader_nr = 0 +    test_run:wait_cond(function() +        for nr, do_check in pairs(nrs) do +            if do_check then +                local is_leader = test_run:eval('election_replica'..nr, +                                                is_leader_cmd)[1] +                if is_leader then +                    leader_nr = nr +                    return true +                end +            end +        end +        return false +    end) +    assert(leader_nr ~= 0) +    return leader_nr +end; + +function name(id) +    return 'election_replica'..id +end; +test_run:cmd('setopt delimiter ""'); + +-- +-- gh-5445: make sure rolled back rows do not reappear once old leader returns +-- to cluster. +-- +SERVERS = {'election_replica1', 'election_replica2' ,'election_replica3'} +test_run:create_cluster(SERVERS, "replication", {args='2 0.4'}) +test_run:wait_fullmesh(SERVERS) + +-- Any of the three instances may bootstrap the cluster and become leader. +is_possible_leader = {true, true, true} +leader_nr = get_leader(is_possible_leader) +leader = name(leader_nr) +next_leader_nr = ((leader_nr - 1) % 3 + 1) % 3 + 1 -- {1, 2, 3} -> {2, 3, 1} +next_leader = name(next_leader_nr) +other_nr = ((leader_nr - 1) % 3 + 2) % 3 + 1 -- {1, 2, 3} -> {3, 1, 2} +other = name(other_nr) + +test_run:switch(leader) +box.ctl.wait_rw() +_ = box.schema.space.create('test', {is_sync=true}) +_ = box.space.test:create_index('pk') +box.space.test:insert{1} + +-- Simulate a situation when the instance which will become the next leader +-- doesn't know of unconfirmed rows. It should roll them back anyways and do not +-- accept them once they actually appear from the old leader. +-- So, stop the instance which'll be the next leader. +test_run:switch('default') +test_run:cmd('stop server '..next_leader) +test_run:switch(leader) +-- Insert some unconfirmed data. +box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=1000} +fib = require('fiber').create(box.space.test.insert, box.space.test, {2}) +fib:status() + +-- 'other', 'leader', 'next_leader' are defined on 'default' node, hence the +-- double switches. +test_run:switch('default') +test_run:switch(other) +-- Wait until the rows are replicated to the other instance. +test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) +box.cfg{election_mode='voter'} +-- Old leader is gone. +test_run:switch('default') +test_run:cmd('stop server '..leader) +is_possible_leader[leader_nr] = false + +-- Emulate a situation when next_leader wins the elections. It can't do that in +-- this configuration, obviously, because it's behind the 'other' node, so set +-- quorum to 1 and imagine there are 2 more servers which would vote for +-- next_leader. +-- Also, make the instance ignore synchronization with other replicas. +-- Otherwise it would stall for replication_sync_timeout. This is due to the +-- nature of the test and may be ignored (we restart the instance to simulate +-- a situation when some rows from the old leader were not received). +test_run:cmd('start server '..next_leader..' with args="1 0.4 candidate 1"') +assert(get_leader(is_possible_leader) == next_leader_nr) +test_run:switch(other) +-- New leader didn't know about the unconfirmed rows but still rolled them back. +test_run:wait_cond(function() return box.space.test:get{2} == nil end) + +test_run:switch('default') +test_run:switch(next_leader) +box.space.test:select{} -- 1 +box.cfg{election_mode='voter'} +-- Old leader returns and old unconfirmed rows from it must be ignored. +test_run:switch('default') +-- Make old leader win the elections. +test_run:cmd('start server '..leader..' with args="1 0.4 candidate 1"') +is_possible_leader[leader_nr] = true +assert(get_leader(is_possible_leader) == leader_nr) +test_run:switch(next_leader) +box.space.test:select{} -- 1 +test_run:wait_upstream(1, {status='follow'}) + +-- Cleanup. +test_run:switch('default') +test_run:drop_cluster(SERVERS) diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index aff5fda26..0a270d3d6 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -17,6 +17,7 @@      "gh-4424-misc-orphan-on-reconfiguration-error.test.lua": {},      "gh-5426-election-on-off.test.lua": {},      "gh-5433-election-restart-recovery.test.lua": {}, +    "gh-5445-leader-incosistency.test.lua": {},      "gh-5506-election-on-off.test.lua": {},      "once.test.lua": {},      "on_replace.test.lua": {}, -- Serge Petrenko