[Tarantool-patches] [PATCH vshard 1/1] replicaset: check URI match when rebind connection

Oleg Babin olegrok at tarantool.org
Fri Aug 7 11:07:22 MSK 2020


Hi! Thanks for your patch. LGTM.

On 07/08/2020 00:48, Vladislav Shpilevoy wrote:
> Reconfiguration in router and storage always creates new
> replica objects, and outdates old objects. Since most of the
> time reconfiguration does not change cluster topology and URIs,
> old connections are relocated to the new replicaset objects so as
> not to reconnect on each router.cfg() and storage.cfg().
>
> Connections were moved from old objects to new ones by UUID match
> not taking into account possible URI change.
>
> If URI was changed, the new replica object took the old connection
> with old URI, and kept trying to connect to the old address.
>
> The patch makes connection relocation work only if old and new URI
> match.
>
> Closes #245
> ---
> Branch:http://github.com/tarantool/vshard/tree/gerold103/gh-245-router-reconnect
> Issue:https://github.com/tarantool/vshard/issues/245
>
>   test/router/reconnect_to_master.result   | 55 ++++++++++++++++++++++++
>   test/router/reconnect_to_master.test.lua | 26 +++++++++++
>   vshard/replicaset.lua                    |  2 +-
>   3 files changed, 82 insertions(+), 1 deletion(-)
>
> diff --git a/test/router/reconnect_to_master.result b/test/router/reconnect_to_master.result
> index d640f0b..aa33f9c 100644
> --- a/test/router/reconnect_to_master.result
> +++ b/test/router/reconnect_to_master.result
> @@ -169,6 +169,61 @@ is_disconnected()
>   ---
>   - false
>   ...
> +--
> +-- gh-245: dynamic uri reconfiguration didn't work - even if URI was changed in
> +-- the config for any instance, it used old connection, because reconfiguration
> +-- compared connections by UUID instead of URI.
> +--
> +util = require('util')
> +---
> +...
> +-- Firstly, clean router from storage_1_a connection.
> +rs1_uuid = util.replicasets[1]
> +---
> +...
> +rs1_cfg = cfg.sharding[rs1_uuid]
> +---
> +...
> +cfg.sharding[rs1_uuid] = nil
> +---
> +...
> +vshard.router.cfg(cfg)
> +---
> +...
> +-- Now break the URI in the config.
> +old_uri = rs1_cfg.replicas[util.name_to_uuid.storage_1_a].uri
> +---
> +...
> +rs1_cfg.replicas[util.name_to_uuid.storage_1_a].uri = 'https://bad_uri.com:123'
> +---
> +...
> +-- Apply the bad config.
> +cfg.sharding[rs1_uuid] = rs1_cfg
> +---
> +...
> +vshard.router.cfg(cfg)
> +---
> +...
> +-- Should fail - master is not available because of the bad URI.
> +res, err = vshard.router.callrw(1, 'echo', {1})
> +---
> +...
> +res == nil and err ~= nil
> +---
> +- true
> +...
> +-- Repair the config.
> +rs1_cfg.replicas[util.name_to_uuid.storage_1_a].uri = old_uri
> +---
> +...
> +vshard.router.cfg(cfg)
> +---
> +...
> +-- Should drop the old connection object and connect fine.
> +vshard.router.callrw(1, 'echo', {1})
> +---
> +- 1
> +...
>   _ = test_run:switch("default")
>   ---
>   ...
> diff --git a/test/router/reconnect_to_master.test.lua b/test/router/reconnect_to_master.test.lua
> index c315c9f..87270af 100644
> --- a/test/router/reconnect_to_master.test.lua
> +++ b/test/router/reconnect_to_master.test.lua
> @@ -70,6 +70,32 @@ while is_disconnected() and i < max_iters do i = i + 1 fiber.sleep(0.1) end
>   -- Master connection is active again.
>   is_disconnected()
>   
> +--
> +-- gh-245: dynamic uri reconfiguration didn't work - even if URI was changed in
> +-- the config for any instance, it used old connection, because reconfiguration
> +-- compared connections by UUID instead of URI.
> +--
> +util = require('util')
> +-- Firstly, clean router from storage_1_a connection.
> +rs1_uuid = util.replicasets[1]
> +rs1_cfg = cfg.sharding[rs1_uuid]
> +cfg.sharding[rs1_uuid] = nil
> +vshard.router.cfg(cfg)
> +-- Now break the URI in the config.
> +old_uri = rs1_cfg.replicas[util.name_to_uuid.storage_1_a].uri
> +rs1_cfg.replicas[util.name_to_uuid.storage_1_a].uri = 'https://bad_uri.com:123'
> +-- Apply the bad config.
> +cfg.sharding[rs1_uuid] = rs1_cfg
> +vshard.router.cfg(cfg)
> +-- Should fail - master is not available because of the bad URI.
> +res, err = vshard.router.callrw(1, 'echo', {1})
> +res == nil and err ~= nil
> +-- Repair the config.
> +rs1_cfg.replicas[util.name_to_uuid.storage_1_a].uri = old_uri
> +vshard.router.cfg(cfg)
> +-- Should drop the old connection object and connect fine.
> +vshard.router.callrw(1, 'echo', {1})
> +
>   _ = test_run:switch("default")
>   _ = test_run:cmd('stop server router_1')
>   _ = test_run:cmd('cleanup server router_1')
> diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
> index 0191936..b13d05e 100644
> --- a/vshard/replicaset.lua
> +++ b/vshard/replicaset.lua
> @@ -456,7 +456,7 @@ local function rebind_replicasets(replicasets, old_replicasets)
>           for replica_uuid, replica in pairs(replicaset.replicas) do
>               local old_replica = old_replicaset and
>                                   old_replicaset.replicas[replica_uuid]
> -            if old_replica then
> +            if old_replica and old_replica.uri == replica.uri then
>                   local conn = old_replica.conn
>                   replica.conn = conn
>                   replica.down_ts = old_replica.down_ts


More information about the Tarantool-patches mailing list