From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp53.i.mail.ru (smtp53.i.mail.ru [94.100.177.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 87A924696C3 for ; Thu, 2 Apr 2020 16:29:56 +0300 (MSK) From: Olga Arkhangelskaia Date: Thu, 2 Apr 2020 16:29:47 +0300 Message-Id: <20200402132948.12804-1-arkholga@tarantool.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH rfc 0/1] replication: stop resetting existing connections List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tarantool-patches@dev.tarantool.org Every time we want to change replication configuration even slightly we need to reset all the existing connections and bootstrap replica from every beginning. Such behavior has some shortcomings: https://github.com/tarantool/tarantool/issues/4669 https://github.com/tarantool/tarantool/issues/4668 In this rfc I tried to get rid of extra job and create and delete only those appliers that has to be deleted (no more in cfg file) or to be created (new one). I used straightforward way to check if it is possible and revel all problem that we will face in case we decide to change schema. (And may be add smth like box_add_replica_source or box_remove_replica cllbacks) Quorum problem: In case quorum < total and replica did not get it's UUID (in anon list) replica will stuck in anon list forever. I mean next time we have this source in cfg it will treated as existing one. At the moment I clean anon list every time. Anon problem: While anon - nonanon transition we need to remove anon replica and re-connect it. From config point of view replica exists, so i need to preserve anon state for some time and add such replica to remove array and to new appliers array. Connection to self: Because we only throw away replicas whose sources are in remove array replica with connection to self will assert. I just added special check for self connection while removing replicas from the replicaset. I use applier->source to distinguish whether applier should be created. Another way to solve part of problems caused by #4669 #4668 is too check existing and new configuration for exact match. I need your help to see if I have missed something. And would be happy to read your thoughts and suggestions. Olga Arkhangelskaia (1): replication: stop resetting existing connections src/box/box.cc | 106 +++++++++++++++++++++++++----- src/box/box.h | 1 + src/box/replication.cc | 56 +++++++++++----- src/box/replication.h | 3 +- test/replication/misc.result | 6 ++ test/replication/misc.test.lua | 2 + test/replication/quorum.result | 3 + test/replication/quorum.test.lua | 1 + test/replication/replica_self.lua | 11 ++++ test/replication/self.result | 68 +++++++++++++++++++ test/replication/self.test.lua | 25 +++++++ 11 files changed, 248 insertions(+), 34 deletions(-) create mode 100644 test/replication/replica_self.lua create mode 100644 test/replication/self.result create mode 100644 test/replication/self.test.lua -- 2.20.1 (Apple Git-117)