From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 57D1121DBA for ; Tue, 10 Jul 2018 11:16:39 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MFLSnGTH1aF2 for ; Tue, 10 Jul 2018 11:16:39 -0400 (EDT) Received: from smtp50.i.mail.ru (smtp50.i.mail.ru [94.100.177.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 13DBA251DA for ; Tue, 10 Jul 2018 11:16:39 -0400 (EDT) Subject: [tarantool-patches] Re: [commits] [tarantool] 01/04: Add mapping rfc References: <152787812686.7166.13858788747309143458@localhost> <1527877378.906931598.53020600671932000@mxpdd4.i.mail.ru> From: Vladislav Shpilevoy Message-ID: Date: Tue, 10 Jul 2018 18:16:37 +0300 MIME-Version: 1.0 In-Reply-To: <1527877378.906931598.53020600671932000@mxpdd4.i.mail.ru> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: Ilya Markov , "tarantool-patches@freelists.org" , Konstantin Osipov On 01/06/2018 21:35, Ilya Markov wrote: > This is an automated email from the git hooks/post-receive script. > > IlyaMarkovMipt pushed a commit to branch gh-3098-remapping-replicas > in repository tarantool. > > commit 54600183d2a6e9d2ff44c92284259b50a4bc3d46 > Author: Ilya Markov > AuthorDate: Fri Jun 1 13:08:28 2018 +0300 > > Add mapping rfc > --- > doc/rfc/3098-replicas-id-remapping.md | 120 ++++++++++++++++++++++++++++++++++ > 1 file changed, 120 insertions(+) > > diff --git a/doc/rfc/3098-replicas-id-remapping.md b/doc/rfc/3098-replicas-id-remapping.md > new file mode 100644 > index 0000000..3ae1254 > --- /dev/null > +++ b/doc/rfc/3098-replicas-id-remapping.md > @@ -0,0 +1,120 @@ > +## Problems and ways to overcome them > + > +1. Problem with primary key in _cluster. So far primary key in _cluster is replica_id. > +But as we want to update inside before triggers according to our local replica id assigning, > + we need to update this field. Nevertheless, it's prohibited to update primary key field inside before_triggers. > + > +*Solution*: > + That's why we alter primary index to indexing uuid field. The second index we alter to indexing replica_id. > + Why not just delete the old and insert new record? > + > +2. Problem with simultaneous appliers. When several appliers exist in one moment, several triggers > +are set and each of them will be called. The problem is when the new tuple is delivered, > +we want to handle it only once, therewith by the trigger set by the applier > + for which tuple has come for. > +Therefore, we need to map tuples to appliers inside triggers.> + > +*Solution*: > +The idea we decided to implement is to add third field in tuples representing the uuid of replica it was sent. > +With that we can decide whether this tuple was sent to the applier for which trigger was called, > +simply comparing third field of tuple with applier->uuid. > + > +3. Before triggers are not called on join operation, so we don't update some of our _cluster meta data. > + It's not a problem for mappings, because the joining replica doesn't have _cluster at all. > + But it's problem for local replica id counter. It should be updated on each new replica added. > + > + *Solution*: On the call of _cluster trigger(the one is not assigned to any applier), > + we check if we have already updated local replica id counter. > + If yes, we use its value. > + Otherwise, we use the maximum replica id from _cluster table. > + > + Also the problem here is that the third field is not updated on join. > + But as such not-updated tuple are written in snapshots and in future can be handled only in join again, > + this field will be unnecessary. > + > +4. When should we set up the triggers? The initial data reception at join phase does not require mapping > + because within that phase node doesn't have an empty _cluster. But on subscribe or on recovery triggers are required. > + > +*Solution*: Trigger used for global counter is set on bootstrap, > +the others are set either in join after initial data receiving, or in subscribe phase. > + > +5. How to handle global counter? Global counter is used to assign new replicas ids. > +We have to assign it unique in order not to overlap it with other alive and disabled replicas. > + > +*Solution* Let's assign replica counter `RC`. > +On new replica registration we calculate `RC = max(max_id(_cluster), RC) + 1` > +With this formula we take into account the fact that triggers are not called on initial data reception during join phase, > +and the fact that replicas may be deleted. > + > +6. Another issue is the tuples whose third field(source uuid) is unknown for replica. > + > +In this case we would spoil _cluster, because we don't have trigger to handle this tuple. > + > +*Solution*: Skip such tuples. We need this tuples from _cluster mostly only for tracking vclocks. > +But if replica doesn't have applier for the replica with such uuid then this replica should not be vclock representation. > + > +## Alternatives > + > +Possible alternative was to use the uniqueness of UUID and > + store uuid instead of replica id in vclocks and xrows. In this way, there would be no need in remapping, as we could easily distinguish the replica. > + But the approach consumes much more memory and message size than previous one. > + Size of uuids is bigger in magnitude than simple identifiers. >