[Tarantool-patches] [PATCH vshard 6/6] router: update master using a hint from storage

Oleg Babin olegrok at tarantool.org
Fri Jul 2 14:49:06 MSK 2021


Thanks for your patch. See 2 comments below.

On 02.07.2021 01:09, Vladislav Shpilevoy wrote:
> Storage sends NON_MASTER error when an attempt happens to make a
> read-write request on it while it is not a master. The error
> contains UUID of the instance.
>
> The patch adds to this error a new field - UUID of the master as
> it is seen on this storage. Router now uses that information to
> quickly switch its read-write requests to the new master. In fact,
> this should happen in almost all cases of master auto-discovery on
> the router if it occurs under load.
>
> Closes #75
> ---
>   test/router/master_discovery.result   | 427 ++++++++++++++++++++++++++
>   test/router/master_discovery.test.lua | 191 ++++++++++++
>   test/router/router.result             |   8 +-
>   vshard/error.lua                      |   2 +-
>   vshard/replicaset.lua                 |  65 ++++
>   vshard/router/init.lua                |  17 +-
>   vshard/storage/init.lua               |   6 +-
>   7 files changed, 706 insertions(+), 10 deletions(-)
>
>
>   ...
> @@ -1175,6 +1176,7 @@ error_messages
>     - Use replicaset:connect_replica(...) instead of replicaset.connect_replica(...)
>     - Use replicaset:down_replica_priority(...) instead of replicaset.down_replica_priority(...)
>     - Use replicaset:up_replica_priority(...) instead of replicaset.up_replica_priority(...)
> +  - Use replicaset:update_master(...) instead of replicaset.update_master(...)
>     - Use replicaset:wait_connected(...) instead of replicaset.wait_connected(...)
>   ...
>   _, replica = next(replicaset.replicas)
> diff --git a/vshard/error.lua b/vshard/error.lua
> index e2d1a31..fa7bdaa 100644
> --- a/vshard/error.lua
> +++ b/vshard/error.lua
> @@ -20,7 +20,7 @@ local error_message_template = {
>       [2] = {
>           name = 'NON_MASTER',
>           msg = 'Replica %s is not a master for replicaset %s anymore',
> -        args = {'replica_uuid', 'replicaset_uuid'}
> +        args = {'replica_uuid', 'replicaset_uuid', 'master_uuid'}
Error format string still contains only 2 arguments. Is it ok?
>       },
>       [3] = {
>           name = 'BUCKET_ALREADY_EXISTS',
> diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
> index 7747258..660c786 100644
> --- a/vshard/replicaset.lua
> +++ b/vshard/replicaset.lua
> @@ -570,6 +570,70 @@ local function rebind_replicasets(replicasets, old_replicasets)
>       end
>   end
>   
> +--
> +-- Let the replicaset know @a old_master_uuid is not a master anymore, should
> +-- use @a candidate_uuid instead.
> +-- Returns whether the request, which brought this information, can be retried.
> +--
> +local function replicaset_update_master(replicaset, old_master_uuid,
> +                                        candidate_uuid)
> +    local is_auto = replicaset.is_auto_master
> +    local replicaset_uuid = replicaset.uuid
> +    if old_master_uuid == candidate_uuid then
> +        -- It should not happen ever, but be ready to everything.
> +        log.warn('Replica %s in replicaset %s reports self as both master '..
> +                 'and not master', master_uuid, replicaset_uuid)
> +        return is_auto
> +    end
> +    local master = replicaset.master
> +    if not master then
> +        if not is_auto or not candidate_uuid then
> +            return is_auto
> +        end
> +        local candidate = replicaset.replicas[candidate_uuid]

AFAIU it means that candidate_uuid doesn't belong to replicaset.

Why is it true?

> +        if not candidate then
> +            return true
> +        end
> +        replicaset.master = candidate
> +        log.info('Replica %s becomes a master as reported by %s for '..
> +                 'replicaset %s', candidate_uuid, old_master_uuid,
> +                 replicaset_uuid)
> +        return true


More information about the Tarantool-patches mailing list