From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp18.mail.ru (smtp18.mail.ru [94.100.176.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 972D242EF5C for ; Sun, 5 Jul 2020 02:18:07 +0300 (MSK) References: <9a78892071bb44779f3bc21788b86b8c53a8ace5.1593899478.git.sergepetrenko@tarantool.org> From: Vladislav Shpilevoy Message-ID: <42709e56-598d-95eb-4e79-99e075b64b03@tarantool.org> Date: Sun, 5 Jul 2020 01:18:05 +0200 MIME-Version: 1.0 In-Reply-To: <9a78892071bb44779f3bc21788b86b8c53a8ace5.1593899478.git.sergepetrenko@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH 2/2] box: introduce a cfg handle to become syncro leader List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Serge Petrenko , gorcunov@gmail.com, sergos@tarantool.org Cc: tarantool-patches@dev.tarantool.org Here is also a general problem - having this as box.cfg option means, that the selected leader should stay selected regardless of what happens in the cluster. In particular, it should reject any attempts to add an entry into the limbo, not originated from this instance. Currently this is not guaranteed, see comment below. > diff --git a/src/box/box.cc b/src/box/box.cc > index ca24b98ca..087710383 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -78,6 +78,7 @@ > #include "sequence.h" > #include "sql_stmt_cache.h" > #include "msgpack.h" > +#include "trivia/util.h" > > static char status[64] = "unknown"; > > @@ -945,6 +946,84 @@ box_set_replication_anon(void) > > } > > +void > +box_set_replication_synchro_leader(void) > +{ > + bool is_leader = cfg_geti("replication_synchro_leader"); > + /* > + * For now no actions required when an instance stops > + * being a leader. We should probably wait until txn_limbo > + * becomes empty. > + */ > + if (!is_leader) > + return; > + uint32_t former_leader_id = txn_limbo.instance_id; > + if (former_leader_id == REPLICA_ID_NIL || > + former_leader_id == instance_id) { When limbo is empty, it will change its instance id to whatever entry will be added next. So it can happen, that I gave replication_synchro_leader to 2 instances, and if they will create transactions one at a time, this will work. But looks wrong. Perhaps it would be better to add a box.ctl function to do this 'limbo cleanup'? Without persisting any leader role in a config. Until we have a better understanding how leader-read_only-master roles coexist.