From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f66.google.com (mail-lf1-f66.google.com [209.85.167.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id D25F2445320 for ; Tue, 7 Jul 2020 14:50:27 +0300 (MSK) Received: by mail-lf1-f66.google.com with SMTP id t74so24559894lff.2 for ; Tue, 07 Jul 2020 04:50:27 -0700 (PDT) Date: Tue, 7 Jul 2020 14:50:24 +0300 From: Cyrill Gorcunov Message-ID: <20200707115024.GA1999@grain> References: <20200702202141.4821-1-skaplun@tarantool.org> <20200706142654.GM2256@grain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200706142654.GM2256@grain> Subject: Re: [Tarantool-patches] [DRAFT v2] replication: track information about replica List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Kaplun Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy On Mon, Jul 06, 2020 at 05:26:54PM +0300, Cyrill Gorcunov wrote: > On Thu, Jul 02, 2020 at 11:21:41PM +0300, Sergey Kaplun wrote: > > This is a draft for the patch. > > The patch allows to track information about changing relay state. At > > every change of relay state timestamp, vclock, new state (and error > > message if exists) will be saved at _cluster space. > > > > The patch adds trigger list at relay, that is invoked when relay changes > > its state. The trigger that updates _cluster space is setted when a > > replica is registered. > > --- Sergey, let me start from my understanding of a problem and correct me if I'm wrong in my assumptions please. Currently (master banch) information about replicas fetched from lbox_info_replication --- replicaset_foreach(replica) { ... lbox_pushreplica(L, replica); lua_pushinteger(L, replica->id); lua_pushstring(L, tt_uuid_str(&replica->uuid)); luaL_pushuint64(L, vclock_get(&replicaset.vclock, replica->id)); lua_pushstring(L, "upstream"); lbox_pushapplier(L, applier); lua_pushstring(L, "downstream"); lbox_pushrelay(L, relay); --- IOW everytime we call info method the information is fetched from run-time variables. Now we wanna get more information about relays (their ip:port, vclock, timestamp, state and etc). So we extend system "_cluster" table for this sake (currently _cluster table consists of replica-id:uuid pairs). Note that this pairs are transparent to all nodes in the cluster and strictly speaking not bound to ip, vclocks and etc. Moreover I've heaard that we're about to replicate the _cluster table itself in sake of qsync (but I might be wrong here and Vlad should know more). Thus it looks to me that extending this _cluster table for additional data is somewhat close to a big hammer. From the bug decription seems the main idea is to be able to rip off dead replicas which sounds reasonable but definitely not addressed in this patch. Taking into account that it is early RFC I think it is just fine. Still I don't understand why we're updating _cluster from the trigger, can't we do that from relay_start/stop routines directly without using triggers at all? Or here is a problem with the fact that relays are separate threads?