From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 6DF816BD0C; Sun, 11 Apr 2021 20:59:11 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 6DF816BD0C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1618163951; bh=XHAsUgtQAxS3fBWYU/Ahd5TJpPyFypwqIBbIZ5Jmoks=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=t5IjX+5Z0Bno3WnXmIRB5KhKRjuhyp6pOjCYpH89prRZW6J96ieIEb87ngCDXrs4o bCBmuFiSBM45j6IOEcWBCM+w7nukjI/NeXqIaIJiWdJyYyh3NVL/FBaEZCGfRzOfks 0adoJR73VZTau8YNwirEulQGjAAbs+dXx2FQmOgE= Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 44F616BD0C for ; Sun, 11 Apr 2021 20:56:15 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 44F616BD0C Received: by smtp42.i.mail.ru with esmtpa (envelope-from ) id 1lVeJi-0001Ua-Cx; Sun, 11 Apr 2021 20:56:14 +0300 To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Date: Sun, 11 Apr 2021 20:56:01 +0300 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD92FFCB8E6708E7480B1C8842CE613979723F2FB4628545A35182A05F53808504064A53C91571027BFDFFE7FC16B9FC0778D878F2A3AE4B2E90DCA77509E3EEF8A X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7FC9D84217C4FCDC2EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637328FC23D015AFE6E8638F802B75D45FF914D58D5BE9E6BC1A93B80C6DEB9DEE97C6FB206A91F05B2807515E9A87CD60A1CED43CAED05407D9414A38D98B987DBD2E47CDBA5A96583C09775C1D3CA48CFE97D2AE7161E217F117882F4460429724CE54428C33FAD30A8DF7F3B2552694AC26CFBAC0749D213D2E47CDBA5A9658378DA827A17800CE749E2213E709ACCBA9FA2833FD35BB23DF004C90652538430302FCEF25BFAB3454AD6D5ED66289B5278DA827A17800CE7BC4FC0CD0F5A7289D32BA5DBAC0009BE395957E7521B51C20BC6067A898B09E4090A508E0FED6299176DF2183F8FC7C0AB21AB2B3BFE6FE0CD04E86FAF290E2D7E9C4E3C761E06A71DD303D21008E298D5E8D9A59859A8B6B372FE9A2E580EFC725E5C173C3A84C30584FF81F342DA0735872C767BF85DA2F004C90652538430E4A6367B16DE6309 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8183A4AFAF3EA6BDC44E1F4276B80994196E44850EFB5864EA218CE08911CE9E4B3D3FB00848FA12B999C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EF0417BEADF48D1460699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3433E9BC74ABA5769FA38CA8673C002A66D0EE15AB9F1B79E5B77433C7E77570A5839BCB2FAFCD58D61D7E09C32AA3244CEBD243CC6D8C8DF378F95B9735FE6FF7F26BFA4C8A6946B8927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojbL9S8ysBdXjz3uqod8pbhZ1bVqON1I7F X-Mailru-Sender: 583F1D7ACE8F49BDD2846D59FC20E9F8A3D5B6EFD69438693BDA770A59BD12ACBF2DA60DD686805E424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: [Tarantool-patches] [PATCH 6/9] raft: keep track of greatest known term and filter replication sources based on that X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Start writing the actual leader term together with the PROMOTE request and process terms in PROMOTE requests on receiver side. Make applier only apply synchronous transactions from the instance which has the greatest term as received in PROMOTE requests. Closes #5445 --- ...very => qsync-multi-statement-recovery.md} | 0 changelogs/unreleased/raft-promote.md | 4 +++ src/box/applier.cc | 18 +++++++++++ src/box/box.cc | 15 ++++++---- src/lib/raft/raft.c | 1 + src/lib/raft/raft.h | 30 +++++++++++++++++++ 6 files changed, 63 insertions(+), 5 deletions(-) rename changelogs/unreleased/{qsync-multi-statement-recovery => qsync-multi-statement-recovery.md} (100%) create mode 100644 changelogs/unreleased/raft-promote.md diff --git a/changelogs/unreleased/qsync-multi-statement-recovery b/changelogs/unreleased/qsync-multi-statement-recovery.md similarity index 100% rename from changelogs/unreleased/qsync-multi-statement-recovery rename to changelogs/unreleased/qsync-multi-statement-recovery.md diff --git a/changelogs/unreleased/raft-promote.md b/changelogs/unreleased/raft-promote.md new file mode 100644 index 000000000..e5dac599c --- /dev/null +++ b/changelogs/unreleased/raft-promote.md @@ -0,0 +1,4 @@ +## bugfix/replication + +* Fix a bug in synchronous replication when rolled back transactions could + reappear once a sufficiently old instance reconnected (gh-5445). diff --git a/src/box/applier.cc b/src/box/applier.cc index e8cbbe27a..926d2f7ea 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -849,6 +849,9 @@ apply_synchro_row(struct xrow_header *row) txn_limbo_process(&txn_limbo, &req); + if (req.type == IPROTO_PROMOTE) + raft_source_update_term(box_raft(), req.origin_id, req.term); + struct synchro_entry *entry; entry = synchro_entry_new(row, &req); if (entry == NULL) @@ -1044,6 +1047,21 @@ applier_apply_tx(struct applier *applier, struct stailq *rows) } } + /* + * All the synchronous rows coming from outdated instances are ignored + * and replaced with NOPs to save vclock consistency. + */ + struct applier_tx_row *item; + if (raft_source_has_outdated_term(box_raft(), applier->instance_id) && + (last_row->wait_sync || + (iproto_type_is_synchro_request(first_row->type) && + !iproto_type_is_promote_request(first_row->type)))) { + stailq_foreach_entry(item, rows, next) { + struct xrow_header *row = &item->row; + row->type = IPROTO_NOP; + row->bodycnt = 0; + } + } if (unlikely(iproto_type_is_synchro_request(first_row->type))) { /* * Synchro messages are not transactions, in terms diff --git a/src/box/box.cc b/src/box/box.cc index b093341d3..9b6323b3f 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -426,6 +426,10 @@ wal_stream_apply_synchro_row(struct wal_stream *stream, struct xrow_header *row) return -1; } txn_limbo_process(&txn_limbo, &syn_req); + if (syn_req.type == IPROTO_PROMOTE) { + raft_source_update_term(box_raft(), syn_req.origin_id, + syn_req.term); + } return 0; } @@ -1556,11 +1560,10 @@ box_clear_synchro_queue(bool try_wait) rc = -1; } else { promote: - /* - * Term parameter is unused now, We'll pass - * box_raft()->term there later. - */ - txn_limbo_write_promote(&txn_limbo, wait_lsn, 0); + /* We cannot possibly get here in a volatile state. */ + assert(box_raft()->volatile_term == box_raft()->term); + txn_limbo_write_promote(&txn_limbo, wait_lsn, + box_raft()->term); struct synchro_request req = { .type = 0, /* unused */ .replica_id = 0, /* unused */ @@ -1570,6 +1573,8 @@ promote: }; txn_limbo_read_promote(&txn_limbo, &req); assert(txn_limbo_is_empty(&txn_limbo)); + raft_source_update_term(box_raft(), req.origin_id, + req.lsn); } } in_clear_synchro_queue = false; diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c index 4ea4fc3f8..e9ce8cade 100644 --- a/src/lib/raft/raft.c +++ b/src/lib/raft/raft.c @@ -985,6 +985,7 @@ raft_create(struct raft *raft, const struct raft_vtab *vtab) .death_timeout = 5, .vtab = vtab, }; + vclock_create(&raft->term_map); raft_ev_timer_init(&raft->timer, raft_sm_schedule_new_election_cb, 0, 0); raft->timer.data = raft; diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h index e447f6634..cba45a67d 100644 --- a/src/lib/raft/raft.h +++ b/src/lib/raft/raft.h @@ -207,6 +207,19 @@ struct raft { * subsystems, such as Raft. */ const struct vclock *vclock; + /** + * The biggest term seen by this instance and persisted in WAL as part + * of a PROMOTE request. May be smaller than @a term, while there are + * ongoing elections, or the leader is already known, but this instance + * hasn't read its PROMOTE request yet. + * During other times must be equal to @a term. + */ + uint64_t greatest_known_term; + /** + * Latest terms received with PROMOTE entries from remote instances. + * Raft uses them to determine data from which sources may be applied. + */ + struct vclock term_map; /** State machine timed event trigger. */ struct ev_timer timer; /** Configured election timeout in seconds. */ @@ -243,6 +256,13 @@ raft_is_source_allowed(const struct raft *raft, uint32_t source_id) return !raft->is_enabled || raft->leader == source_id; } +static inline bool +raft_source_has_outdated_term(const struct raft *raft, uint32_t source_id) +{ + uint64_t source_term = vclock_get(&raft->term_map, source_id); + return raft->is_enabled && source_term < raft->greatest_known_term; +} + /** Check if Raft is enabled. */ static inline bool raft_is_enabled(const struct raft *raft) @@ -250,6 +270,16 @@ raft_is_enabled(const struct raft *raft) return raft->is_enabled; } +static inline void +raft_source_update_term(struct raft *raft, uint32_t source_id, uint64_t term) +{ + if ((uint64_t) vclock_get(&raft->term_map, source_id) >= term) + return; + vclock_follow(&raft->term_map, source_id, term); + if (term > raft->greatest_known_term) + raft->greatest_known_term = term; +} + /** Process a raft entry stored in WAL/snapshot. */ void raft_process_recovery(struct raft *raft, const struct raft_msg *req); -- 2.24.3 (Apple Git-128)