[tarantool-patches] Re: [commits] [tarantool] 01/04: Add mapping rfc

Tue Jul 10 18:16:37 MSK 2018

On 01/06/2018 21:35, Ilya Markov wrote:
> This is an automated email from the git hooks/post-receive script.
> 
> IlyaMarkovMipt pushed a commit to branch gh-3098-remapping-replicas
> in repository tarantool.
> 
> commit 54600183d2a6e9d2ff44c92284259b50a4bc3d46
> Author: Ilya Markov <imarkov at tarantool.org>
> AuthorDate: Fri Jun 1 13:08:28 2018 +0300
> 
>      Add mapping rfc
> ---
>   doc/rfc/3098-replicas-id-remapping.md | 120 ++++++++++++++++++++++++++++++++++
>   1 file changed, 120 insertions(+)
> 
> diff --git a/doc/rfc/3098-replicas-id-remapping.md b/doc/rfc/3098-replicas-id-remapping.md
> new file mode 100644
> index 0000000..3ae1254
> --- /dev/null
> +++ b/doc/rfc/3098-replicas-id-remapping.md
> @@ -0,0 +1,120 @@
> +## Problems and ways to overcome them
> +
> +1. Problem with primary key in _cluster. So far primary key in _cluster is replica_id.
> +But as we want to update inside before triggers according to our local replica id assigning,
> + we need to update this field. Nevertheless, it's prohibited to update primary key field inside before_triggers.
> +
> +*Solution*:
> + That's why we alter primary index to indexing uuid field. The second index we alter to indexing replica_id.
> +

Why not just delete the old and insert new record?

> +
> +2. Problem with simultaneous appliers. When several appliers exist in one moment, several triggers
> +are set and each of them will be called. The problem is when the new tuple is delivered,
> +we want to handle it only once, therewith by the trigger set by the applier
> + for which tuple has come for.
> +Therefore, we need to map tuples to appliers inside triggers.> +
> +*Solution*:
> +The idea we decided to implement is to add third field in tuples representing the uuid of replica it was sent.
> +With that we can decide whether this tuple was sent to the applier for which trigger was called,
> +simply comparing third field of tuple with applier->uuid.
> +
> +3. Before triggers are not called on join operation, so we don't update some of our _cluster meta data.
> + It's not a problem for mappings, because the joining replica doesn't have _cluster at all.
> + But it's problem for local replica id counter. It should be updated on each new replica added.
> +
> + *Solution*: On the call of _cluster trigger(the one is not assigned to any applier),
> +  we check if we have already updated local replica id counter.
> +  If yes, we use its value.
> +  Otherwise, we use the maximum replica id from _cluster table.
> +
> +  Also the problem here is that the third field is not updated on join.
> +  But as such not-updated tuple are written in snapshots and in future can be handled only in join again,
> +  this field will be unnecessary.
> +
> +4. When should we set up the triggers? The initial data reception at join phase does not require mapping
> + because within that phase node doesn't have an empty _cluster. But on subscribe or on recovery triggers are required.
> +
> +*Solution*: Trigger used for global counter is set on bootstrap,
> +the others are set either in join after initial data receiving, or in subscribe phase.
> +
> +5. How to handle global counter? Global counter is used to assign new replicas ids.
> +We have to assign it unique in order not to overlap it with other alive and disabled replicas.
> +
> +*Solution* Let's assign replica counter `RC`.
> +On new replica registration we calculate `RC = max(max_id(_cluster), RC) + 1`
> +With this formula we take into account the fact that triggers are not called on initial data reception during join phase,
> +and the fact that replicas may be deleted.
> +
> +6. Another issue is the tuples whose third field(source uuid) is unknown for replica.
> +
> +In this case we would spoil _cluster, because we don't have trigger to handle this tuple.
> +
> +*Solution*: Skip such tuples. We need this tuples from _cluster mostly only for tracking vclocks.
> +But if replica doesn't have applier for the replica with such uuid then this replica should not be vclock representation.
> +
> +## Alternatives
> +
> +Possible alternative was to use the uniqueness of UUID and
> + store uuid instead of replica id in vclocks and xrows. In this way, there would be no need in remapping, as we could easily distinguish the replica.
> + But the approach consumes much more memory and message size than previous one.
> + Size of uuids is bigger in magnitude than simple identifiers.
>