From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id CDE356F87A; Fri, 16 Apr 2021 19:29:20 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org CDE356F87A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1618590560; bh=ON8EjclEKmgDJSrq3f1phN7Ml4mfCEHlH3c/ipz1wKo=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=tNrb0jtyy2GxGddYTV9Y1AjM6Dbs+VcY6hLQsGgDppmB67+UmytoCT+DljludtpTg IDAFetyYCi428FWAizX+JD3VMN0hKvC+tPxEUbNnx3udTSBpqLPh8UUVFicI9LGmX9 j72lubxGcQfnXlgm/2XBIGzM1JtCDQeE8fiSvbOw= Received: from smtp17.mail.ru (smtp17.mail.ru [94.100.176.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 285A674B71 for ; Fri, 16 Apr 2021 19:25:57 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 285A674B71 Received: by smtp17.mail.ru with esmtpa (envelope-from ) id 1lXRI4-00008e-1M; Fri, 16 Apr 2021 19:25:56 +0300 To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Date: Fri, 16 Apr 2021 19:25:38 +0300 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD92FFCB8E6708E74806859AC5FE18436AEED970E897805ADA4182A05F5380850400A11BE2332CA35AC53B43315E3A23C0B884AD9484C5EAF32C4B16F0AF50FD13D X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE706061448EE7F6A81EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637FD169B9D7A3022168638F802B75D45FF914D58D5BE9E6BC1A93B80C6DEB9DEE97C6FB206A91F05B2F5606D3D482BF79AF617CEB4167C563B06708AB62ED68B8ED2E47CDBA5A96583C09775C1D3CA48CFCA5A41EBD8A3A0199FA2833FD35BB23D2EF20D2F80756B5F868A13BD56FB6657A471835C12D1D977725E5C173C3A84C3CA5A41EBD8A3A0199FA2833FD35BB23DF004C90652538430302FCEF25BFAB3454AD6D5ED66289B5278DA827A17800CE72DD9851AB4F62C2ED32BA5DBAC0009BE395957E7521B51C20BC6067A898B09E4090A508E0FED6299176DF2183F8FC7C0040F9FF01DFDA4A8C4224003CC836476EA7A3FFF5B025636E2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8B9149C560DC76099D75ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8183A4AFAF3EA6BDC44E1F4276B80994196BF1196BB3248DD4B30F1BA9E8020DA193629779625E38D2F9C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EF0417BEADF48D1460699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3498EF79680EE3725C233E5B509FDD04E51291E970E360C10463D7811399B0FFF6660EA294D1EEDA261D7E09C32AA3244C5BA30358B0674C74AFED6CF83ABC1A46853296C06374E602927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj3S6P1v0GIqQXit5bX1VxDw== X-Mailru-Sender: 583F1D7ACE8F49BDD2846D59FC20E9F8DF49C5E8585DB537BB7C8CC921E485C13B987B34B7021DBA424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH v4 07/12] raft: filter rows based on known peer terms X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Start writing the actual leader term together with the PROMOTE request and process terms in PROMOTE requests on receiver side. Make applier only apply synchronous transactions from the instance which has the greatest term as received in PROMOTE requests. Closes #5445 --- ...very => qsync-multi-statement-recovery.md} | 0 changelogs/unreleased/raft-promote.md | 4 + src/box/applier.cc | 22 ++ src/box/box.cc | 18 +- src/box/txn_limbo.c | 3 + src/lib/raft/raft.c | 1 + src/lib/raft/raft.h | 46 +++ .../gh-5445-leader-inconsistency.result | 292 ++++++++++++++++++ .../gh-5445-leader-inconsistency.test.lua | 129 ++++++++ test/replication/suite.cfg | 1 + test/unit/raft.c | 37 ++- test/unit/raft.result | 15 +- 12 files changed, 559 insertions(+), 9 deletions(-) rename changelogs/unreleased/{qsync-multi-statement-recovery => qsync-multi-statement-recovery.md} (100%) create mode 100644 changelogs/unreleased/raft-promote.md create mode 100644 test/replication/gh-5445-leader-inconsistency.result create mode 100644 test/replication/gh-5445-leader-inconsistency.test.lua diff --git a/changelogs/unreleased/qsync-multi-statement-recovery b/changelogs/unreleased/qsync-multi-statement-recovery.md similarity index 100% rename from changelogs/unreleased/qsync-multi-statement-recovery rename to changelogs/unreleased/qsync-multi-statement-recovery.md diff --git a/changelogs/unreleased/raft-promote.md b/changelogs/unreleased/raft-promote.md new file mode 100644 index 000000000..e5dac599c --- /dev/null +++ b/changelogs/unreleased/raft-promote.md @@ -0,0 +1,4 @@ +## bugfix/replication + +* Fix a bug in synchronous replication when rolled back transactions could + reappear once a sufficiently old instance reconnected (gh-5445). diff --git a/src/box/applier.cc b/src/box/applier.cc index 40fc5ce86..61d53fdec 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -1027,6 +1027,28 @@ applier_apply_tx(struct applier *applier, struct stailq *rows) } } + /* + * When elections are enabled we must filter out synchronous rows coming + * from an instance that fell behind the current leader. This includes + * both synchronous tx rows and rows for txs following unconfirmed + * synchronous transactions. + * The rows are replaced with NOPs to preserve the vclock consistency. + */ + struct applier_tx_row *item; + if (raft_is_node_outdated(box_raft(), applier->instance_id) && + (last_row->wait_sync || + (iproto_type_is_synchro_request(first_row->type) && + !iproto_type_is_promote_request(first_row->type)))) { + stailq_foreach_entry(item, rows, next) { + struct xrow_header *row = &item->row; + row->type = IPROTO_NOP; + /* + * Row body is saved to fiber's region and will be freed + * on next fiber_gc() call. + */ + row->bodycnt = 0; + } + } if (unlikely(iproto_type_is_synchro_request(first_row->type))) { /* * Synchro messages are not transactions, in terms diff --git a/src/box/box.cc b/src/box/box.cc index 9d45e211e..19f1528ca 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -1503,7 +1503,12 @@ box_clear_synchro_queue(bool try_wait) return -1; } - if (!is_box_configured) + /* + * Do nothing when box isn't configured and when PROMOTE was already + * written for this term. + */ + if (!is_box_configured || + raft_node_term(box_raft(), instance_id) == box_raft()->term) return 0; uint32_t former_leader_id = txn_limbo.owner_id; int64_t wait_lsn = txn_limbo.confirmed_lsn; @@ -1558,17 +1563,16 @@ box_clear_synchro_queue(bool try_wait) rc = -1; } else { promote: - /* - * Term parameter is unused now, We'll pass - * box_raft()->term there later. - */ - txn_limbo_write_promote(&txn_limbo, wait_lsn, 0); + /* We cannot possibly get here in a volatile state. */ + assert(box_raft()->volatile_term == box_raft()->term); + txn_limbo_write_promote(&txn_limbo, wait_lsn, + box_raft()->term); struct synchro_request req = { .type = IPROTO_PROMOTE, .replica_id = former_leader_id, .origin_id = instance_id, .lsn = wait_lsn, - .term = 0, /* unused */ + .term = box_raft()->term, }; txn_limbo_process(&txn_limbo, &req); assert(txn_limbo_is_empty(&txn_limbo)); diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c index 2346331c7..0726b5a04 100644 --- a/src/box/txn_limbo.c +++ b/src/box/txn_limbo.c @@ -34,6 +34,7 @@ #include "iproto_constants.h" #include "journal.h" #include "box.h" +#include "raft.h" struct txn_limbo txn_limbo; @@ -643,6 +644,8 @@ complete: void txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req) { + /* It's ok to process an empty term. It'll just get ignored. */ + raft_process_term(box_raft(), req->origin_id, req->term); if (req->replica_id != limbo->owner_id) { /* * Ignore CONFIRM/ROLLBACK messages for a foreign master. diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c index 4ea4fc3f8..e9ce8cade 100644 --- a/src/lib/raft/raft.c +++ b/src/lib/raft/raft.c @@ -985,6 +985,7 @@ raft_create(struct raft *raft, const struct raft_vtab *vtab) .death_timeout = 5, .vtab = vtab, }; + vclock_create(&raft->term_map); raft_ev_timer_init(&raft->timer, raft_sm_schedule_new_election_cb, 0, 0); raft->timer.data = raft; diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h index e447f6634..a5f7e08d9 100644 --- a/src/lib/raft/raft.h +++ b/src/lib/raft/raft.h @@ -207,6 +207,19 @@ struct raft { * subsystems, such as Raft. */ const struct vclock *vclock; + /** + * The biggest term seen by this instance and persisted in WAL as part + * of a PROMOTE request. May be smaller than @a term, while there are + * ongoing elections, or the leader is already known, but this instance + * hasn't read its PROMOTE request yet. + * During other times must be equal to @a term. + */ + uint64_t greatest_term; + /** + * Latest terms received with PROMOTE entries from remote instances. + * Raft uses them to determine data from which sources may be applied. + */ + struct vclock term_map; /** State machine timed event trigger. */ struct ev_timer timer; /** Configured election timeout in seconds. */ @@ -243,6 +256,39 @@ raft_is_source_allowed(const struct raft *raft, uint32_t source_id) return !raft->is_enabled || raft->leader == source_id; } +/** + * Return the latest term as seen in PROMOTE requests from instance with id + * @a source_id. + */ +static inline uint64_t +raft_node_term(const struct raft *raft, uint32_t source_id) +{ + assert(source_id < VCLOCK_MAX); + return vclock_get(&raft->term_map, source_id); +} + +/** + * Check whether replica with id @a source_id is too old to apply synchronous + * data from it. The check is only valid when elections are enabled. + */ +static inline bool +raft_is_node_outdated(const struct raft *raft, uint32_t source_id) +{ + uint64_t source_term = raft_node_term(raft, source_id); + return raft->is_enabled && source_term < raft->greatest_term; +} + +/** Remember the last term seen for replica with id @a source_id. */ +static inline void +raft_process_term(struct raft *raft, uint32_t source_id, uint64_t term) +{ + if (raft_node_term(raft, source_id) >= term) + return; + vclock_follow(&raft->term_map, source_id, term); + if (term > raft->greatest_term) + raft->greatest_term = term; +} + /** Check if Raft is enabled. */ static inline bool raft_is_enabled(const struct raft *raft) diff --git a/test/replication/gh-5445-leader-inconsistency.result b/test/replication/gh-5445-leader-inconsistency.result new file mode 100644 index 000000000..5c6169f50 --- /dev/null +++ b/test/replication/gh-5445-leader-inconsistency.result @@ -0,0 +1,292 @@ +-- test-run result file version 2 +test_run = require("test_run").new() + | --- + | ... + +is_leader_cmd = "return box.info.election.state == 'leader'" + | --- + | ... + +-- Auxiliary. +test_run:cmd('setopt delimiter ";"') + | --- + | - true + | ... +function name(id) + return 'election_replica'..id +end; + | --- + | ... + +function get_leader(nrs) + local leader_nr = 0 + test_run:wait_cond(function() + for nr, do_check in pairs(nrs) do + if do_check then + local is_leader = test_run:eval(name(nr), + is_leader_cmd)[1] + if is_leader then + leader_nr = nr + return true + end + end + end + return false + end) + assert(leader_nr ~= 0) + return leader_nr +end; + | --- + | ... + +test_run:cmd('setopt delimiter ""'); + | --- + | - true + | ... + +-- +-- gh-5445: make sure rolled back rows do not reappear once old leader returns +-- to cluster. +-- +SERVERS = {'election_replica1', 'election_replica2' ,'election_replica3'} + | --- + | ... +test_run:create_cluster(SERVERS, "replication", {args='2 0.4'}) + | --- + | ... +test_run:wait_fullmesh(SERVERS) + | --- + | ... + +-- Any of the three instances may bootstrap the cluster and become leader. +is_possible_leader = {true, true, true} + | --- + | ... +leader_nr = get_leader(is_possible_leader) + | --- + | ... +leader = name(leader_nr) + | --- + | ... +next_leader_nr = ((leader_nr - 1) % 3 + 1) % 3 + 1 -- {1, 2, 3} -> {2, 3, 1} + | --- + | ... +next_leader = name(next_leader_nr) + | --- + | ... +other_nr = ((leader_nr - 1) % 3 + 2) % 3 + 1 -- {1, 2, 3} -> {3, 1, 2} + | --- + | ... +other = name(other_nr) + | --- + | ... + +test_run:switch(other) + | --- + | - true + | ... +box.cfg{election_mode='voter'} + | --- + | ... +test_run:switch('default') + | --- + | - true + | ... + +test_run:switch(next_leader) + | --- + | - true + | ... +box.cfg{election_mode='voter'} + | --- + | ... +test_run:switch('default') + | --- + | - true + | ... + +test_run:switch(leader) + | --- + | - true + | ... +box.ctl.wait_rw() + | --- + | ... +_ = box.schema.space.create('test', {is_sync=true}) + | --- + | ... +_ = box.space.test:create_index('pk') + | --- + | ... +box.space.test:insert{1} + | --- + | - [1] + | ... + +-- Simulate a situation when the instance which will become the next leader +-- doesn't know of unconfirmed rows. It should roll them back anyways and do not +-- accept them once they actually appear from the old leader. +-- So, stop the instance which'll be the next leader. +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server '..next_leader) + | --- + | - true + | ... +test_run:switch(leader) + | --- + | - true + | ... +-- Insert some unconfirmed data. +box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=1000} + | --- + | ... +fib = require('fiber').create(box.space.test.insert, box.space.test, {2}) + | --- + | ... +fib:status() + | --- + | - suspended + | ... + +-- 'other', 'leader', 'next_leader' are defined on 'default' node, hence the +-- double switches. +test_run:switch('default') + | --- + | - true + | ... +test_run:switch(other) + | --- + | - true + | ... +-- Wait until the rows are replicated to the other instance. +test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) + | --- + | - true + | ... +-- Old leader is gone. +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server '..leader) + | --- + | - true + | ... +is_possible_leader[leader_nr] = false + | --- + | ... + +-- Emulate a situation when next_leader wins the elections. It can't do that in +-- this configuration, obviously, because it's behind the 'other' node, so set +-- quorum to 1 and imagine there are 2 more servers which would vote for +-- next_leader. +-- Also, make the instance ignore synchronization with other replicas. +-- Otherwise it would stall for replication_sync_timeout. This is due to the +-- nature of the test and may be ignored (we restart the instance to simulate +-- a situation when some rows from the old leader were not received). +test_run:cmd('start server '..next_leader..' with args="1 0.4 candidate 1"') + | --- + | - true + | ... +assert(get_leader(is_possible_leader) == next_leader_nr) + | --- + | - true + | ... +test_run:switch(other) + | --- + | - true + | ... +-- New leader didn't know about the unconfirmed rows but still rolled them back. +test_run:wait_cond(function() return box.space.test:get{2} == nil end) + | --- + | - true + | ... + +test_run:switch('default') + | --- + | - true + | ... +test_run:switch(next_leader) + | --- + | - true + | ... +-- No signs of the unconfirmed transaction. +box.space.test:select{} -- 1 + | --- + | - - [1] + | ... + +test_run:switch('default') + | --- + | - true + | ... +-- Old leader returns and old unconfirmed rows from it must be ignored. +-- Note, it wins the elections fairly. +test_run:cmd('start server '..leader..' with args="3 0.4 voter"') + | --- + | - true + | ... +test_run:wait_lsn(leader, next_leader) + | --- + | ... +test_run:switch(leader) + | --- + | - true + | ... +test_run:wait_cond(function() return box.space.test:get{2} == nil end) + | --- + | - true + | ... +box.cfg{election_mode='candidate'} + | --- + | ... + +test_run:switch('default') + | --- + | - true + | ... +test_run:switch(next_leader) + | --- + | - true + | ... +-- Resign to make old leader win the elections. +box.cfg{election_mode='voter'} + | --- + | ... + +test_run:switch('default') + | --- + | - true + | ... +is_possible_leader[leader_nr] = true + | --- + | ... +assert(get_leader(is_possible_leader) == leader_nr) + | --- + | - true + | ... + +test_run:switch(next_leader) + | --- + | - true + | ... +test_run:wait_upstream(1, {status='follow'}) + | --- + | - true + | ... +box.space.test:select{} -- 1 + | --- + | - - [1] + | ... + +-- Cleanup. +test_run:switch('default') + | --- + | - true + | ... +test_run:drop_cluster(SERVERS) + | --- + | ... diff --git a/test/replication/gh-5445-leader-inconsistency.test.lua b/test/replication/gh-5445-leader-inconsistency.test.lua new file mode 100644 index 000000000..e7952f5fa --- /dev/null +++ b/test/replication/gh-5445-leader-inconsistency.test.lua @@ -0,0 +1,129 @@ +test_run = require("test_run").new() + +is_leader_cmd = "return box.info.election.state == 'leader'" + +-- Auxiliary. +test_run:cmd('setopt delimiter ";"') +function name(id) + return 'election_replica'..id +end; + +function get_leader(nrs) + local leader_nr = 0 + test_run:wait_cond(function() + for nr, do_check in pairs(nrs) do + if do_check then + local is_leader = test_run:eval(name(nr), + is_leader_cmd)[1] + if is_leader then + leader_nr = nr + return true + end + end + end + return false + end) + assert(leader_nr ~= 0) + return leader_nr +end; + +test_run:cmd('setopt delimiter ""'); + +-- +-- gh-5445: make sure rolled back rows do not reappear once old leader returns +-- to cluster. +-- +SERVERS = {'election_replica1', 'election_replica2' ,'election_replica3'} +test_run:create_cluster(SERVERS, "replication", {args='2 0.4'}) +test_run:wait_fullmesh(SERVERS) + +-- Any of the three instances may bootstrap the cluster and become leader. +is_possible_leader = {true, true, true} +leader_nr = get_leader(is_possible_leader) +leader = name(leader_nr) +next_leader_nr = ((leader_nr - 1) % 3 + 1) % 3 + 1 -- {1, 2, 3} -> {2, 3, 1} +next_leader = name(next_leader_nr) +other_nr = ((leader_nr - 1) % 3 + 2) % 3 + 1 -- {1, 2, 3} -> {3, 1, 2} +other = name(other_nr) + +test_run:switch(other) +box.cfg{election_mode='voter'} +test_run:switch('default') + +test_run:switch(next_leader) +box.cfg{election_mode='voter'} +test_run:switch('default') + +test_run:switch(leader) +box.ctl.wait_rw() +_ = box.schema.space.create('test', {is_sync=true}) +_ = box.space.test:create_index('pk') +box.space.test:insert{1} + +-- Simulate a situation when the instance which will become the next leader +-- doesn't know of unconfirmed rows. It should roll them back anyways and do not +-- accept them once they actually appear from the old leader. +-- So, stop the instance which'll be the next leader. +test_run:switch('default') +test_run:cmd('stop server '..next_leader) +test_run:switch(leader) +-- Insert some unconfirmed data. +box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=1000} +fib = require('fiber').create(box.space.test.insert, box.space.test, {2}) +fib:status() + +-- 'other', 'leader', 'next_leader' are defined on 'default' node, hence the +-- double switches. +test_run:switch('default') +test_run:switch(other) +-- Wait until the rows are replicated to the other instance. +test_run:wait_cond(function() return box.space.test:get{2} ~= nil end) +-- Old leader is gone. +test_run:switch('default') +test_run:cmd('stop server '..leader) +is_possible_leader[leader_nr] = false + +-- Emulate a situation when next_leader wins the elections. It can't do that in +-- this configuration, obviously, because it's behind the 'other' node, so set +-- quorum to 1 and imagine there are 2 more servers which would vote for +-- next_leader. +-- Also, make the instance ignore synchronization with other replicas. +-- Otherwise it would stall for replication_sync_timeout. This is due to the +-- nature of the test and may be ignored (we restart the instance to simulate +-- a situation when some rows from the old leader were not received). +test_run:cmd('start server '..next_leader..' with args="1 0.4 candidate 1"') +assert(get_leader(is_possible_leader) == next_leader_nr) +test_run:switch(other) +-- New leader didn't know about the unconfirmed rows but still rolled them back. +test_run:wait_cond(function() return box.space.test:get{2} == nil end) + +test_run:switch('default') +test_run:switch(next_leader) +-- No signs of the unconfirmed transaction. +box.space.test:select{} -- 1 + +test_run:switch('default') +-- Old leader returns and old unconfirmed rows from it must be ignored. +-- Note, it wins the elections fairly. +test_run:cmd('start server '..leader..' with args="3 0.4 voter"') +test_run:wait_lsn(leader, next_leader) +test_run:switch(leader) +test_run:wait_cond(function() return box.space.test:get{2} == nil end) +box.cfg{election_mode='candidate'} + +test_run:switch('default') +test_run:switch(next_leader) +-- Resign to make old leader win the elections. +box.cfg{election_mode='voter'} + +test_run:switch('default') +is_possible_leader[leader_nr] = true +assert(get_leader(is_possible_leader) == leader_nr) + +test_run:switch(next_leader) +test_run:wait_upstream(1, {status='follow'}) +box.space.test:select{} -- 1 + +-- Cleanup. +test_run:switch('default') +test_run:drop_cluster(SERVERS) diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 4a9ca0a46..8b185ce7e 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -19,6 +19,7 @@ "gh-5213-qsync-applier-order-3.test.lua": {}, "gh-5426-election-on-off.test.lua": {}, "gh-5433-election-restart-recovery.test.lua": {}, + "gh-5445-leader-inconsistency.test.lua": {}, "gh-5506-election-on-off.test.lua": {}, "once.test.lua": {}, "on_replace.test.lua": {}, diff --git a/test/unit/raft.c b/test/unit/raft.c index d0d13d8c7..0306cefcd 100644 --- a/test/unit/raft.c +++ b/test/unit/raft.c @@ -1267,10 +1267,44 @@ raft_test_too_long_wal_write(void) raft_finish_test(); } +static void +raft_test_term_filter(void) +{ + raft_start_test(9); + struct raft_node node; + raft_node_create(&node); + + is(raft_node_term(&node.raft, 1), 0, "empty node term"); + ok(!raft_is_node_outdated(&node.raft, 1), "not outdated initially"); + + raft_process_term(&node.raft, 1, 1); + is(raft_node_term(&node.raft, 1), 1, "node term updated"); + ok(raft_is_node_outdated(&node.raft, 2), "other nodes are outdated"); + + raft_process_term(&node.raft, 2, 100); + ok(raft_is_node_outdated(&node.raft, 1), "node outdated when others " + "have greater term"); + ok(!raft_is_node_outdated(&node.raft, 2), "node with greatest term " + "isn't outdated"); + + raft_process_term(&node.raft, 3, 100); + ok(!raft_is_node_outdated(&node.raft, 2), "node not outdated when " + "others have the same term"); + + raft_process_term(&node.raft, 3, 99); + is(raft_node_term(&node.raft, 3), 100, "node term isn't decreased"); + ok(!raft_is_node_outdated(&node.raft, 3), "node doesn't become " + "outdated"); + + + raft_node_destroy(&node); + raft_finish_test(); +} + static int main_f(va_list ap) { - raft_start_test(13); + raft_start_test(14); (void) ap; fakeev_init(); @@ -1288,6 +1322,7 @@ main_f(va_list ap) raft_test_death_timeout(); raft_test_enable_disable(); raft_test_too_long_wal_write(); + raft_test_term_filter(); fakeev_free(); diff --git a/test/unit/raft.result b/test/unit/raft.result index 96bfc3b86..ecb962e42 100644 --- a/test/unit/raft.result +++ b/test/unit/raft.result @@ -1,5 +1,5 @@ *** main_f *** -1..13 +1..14 *** raft_test_leader_election *** 1..24 ok 1 - 1 pending message at start @@ -220,4 +220,17 @@ ok 12 - subtests ok 8 - became candidate ok 13 - subtests *** raft_test_too_long_wal_write: done *** + *** raft_test_term_filter *** + 1..9 + ok 1 - empty node term + ok 2 - not outdated initially + ok 3 - node term updated + ok 4 - other nodes are outdated + ok 5 - node outdated when others have greater term + ok 6 - node with greatest term isn't outdated + ok 7 - node not outdated when others have the same term + ok 8 - node term isn't decreased + ok 9 - node doesn't become outdated +ok 14 - subtests + *** raft_test_term_filter: done *** *** main_f: done *** -- 2.24.3 (Apple Git-128)