Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH vshard 0/6] Master discovery
@ 2021-07-01 22:09 Vladislav Shpilevoy via Tarantool-patches
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 1/6] replicaset: introduce netbox_wait_connected() Vladislav Shpilevoy via Tarantool-patches
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-01 22:09 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

The patchset implements master discovery for the router using polling in a
separate fiber + a hint from the storage when it rejects an RW request due to
being not a master.

The solution with subscriptions wasn't done due to architectural issues of
box.session.push() on the storage making it too expensive to use both on the
client and on the storage when number of connections is big.

Branch: http://github.com/tarantool/vshard/tree/gerold103/gh-75-master-discovery
Issue: https://github.com/tarantool/vshard/issues/75

Vladislav Shpilevoy (6):
  replicaset: introduce netbox_wait_connected()
  test: sort some table prints
  storage: introduce vshard.storage._call('info')
  config: introduce master 'auto' replicaset option
  router: introduce automatic master discovery
  router: update master using a hint from storage

 test/router/master_discovery.result   | 941 ++++++++++++++++++++++++++
 test/router/master_discovery.test.lua | 439 ++++++++++++
 test/router/router.result             |  36 +-
 test/router/router.test.lua           |   2 +
 test/storage/storage.result           |  14 +
 test/storage/storage.test.lua         |   7 +
 test/unit/config.result               |  48 ++
 test/unit/config.test.lua             |  21 +
 test/upgrade/upgrade.result           |  15 +-
 vshard/cfg.lua                        |  27 +-
 vshard/consts.lua                     |   5 +
 vshard/error.lua                      |   7 +-
 vshard/replicaset.lua                 | 326 +++++++--
 vshard/router/init.lua                | 115 +++-
 vshard/storage/init.lua               |  16 +-
 15 files changed, 1939 insertions(+), 80 deletions(-)
 create mode 100644 test/router/master_discovery.result
 create mode 100644 test/router/master_discovery.test.lua

-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Tarantool-patches] [PATCH vshard 1/6] replicaset: introduce netbox_wait_connected()
  2021-07-01 22:09 [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-01 22:09 ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-02 11:46   ` Oleg Babin via Tarantool-patches
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 2/6] test: sort some table prints Vladislav Shpilevoy via Tarantool-patches
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-01 22:09 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

It is extracted from replicaset_wait_connected(). The new
function is going to be used in the future patch with master
discovery. There it will be needed to send async requests to
several replicas at once, but before that need to ensure they are
connected. Otherwise the async request fails right away.

Needed for #75
---
 vshard/replicaset.lua | 62 ++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 28 deletions(-)

diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
index 56ea165..fa048c9 100644
--- a/vshard/replicaset.lua
+++ b/vshard/replicaset.lua
@@ -104,6 +104,38 @@ local function netbox_on_disconnect(conn)
     conn.replica.down_ts = fiber_clock()
 end
 
+--
+-- Wait until the connection is established. This is necessary at least for
+-- async requests because they fail immediately if the connection is not done.
+-- Returns the remaining timeout because is expected to be used to connect to
+-- many instances in a loop, where such return saves one clock get in the caller
+-- code and is just cleaner code.
+--
+local function netbox_wait_connected(conn, timeout)
+    -- Fast path. Usually everything is connected.
+    if conn:is_connected() then
+        return timeout
+    end
+    local deadline = fiber_clock() + timeout
+    -- Loop against spurious wakeups.
+    repeat
+        -- Netbox uses fiber_cond inside, which throws an irrelevant usage error
+        -- at negative timeout. Need to check the case manually.
+        if timeout < 0 then
+            return nil, lerror.timeout()
+        end
+        local ok, res = pcall(conn.wait_connected, conn, timeout)
+        if not ok then
+            return nil, lerror.make(res)
+        end
+        if not res then
+            return nil, lerror.timeout()
+        end
+        timeout = deadline - fiber_clock()
+    until conn:is_connected()
+    return timeout
+end
+
 --
 -- Connect to a specified replica and remember a new connection
 -- in the replica object. Note, that the function does not wait
@@ -140,36 +172,10 @@ local function replicaset_connect_master(replicaset)
 end
 
 --
--- Wait until the master instance is connected. This is necessary at least for
--- async requests because they fail immediately if the connection is not
--- established.
--- Returns the remaining timeout because is expected to be used to connect to
--- many replicasets in a loop, where such return saves one clock get in the
--- caller code and is just cleaner code.
+-- Wait until the master instance is connected.
 --
 local function replicaset_wait_connected(replicaset, timeout)
-    local deadline = fiber_clock() + timeout
-    local ok, res
-    while true do
-        local conn = replicaset_connect_master(replicaset)
-        if conn.state == 'active' then
-            return timeout
-        end
-        -- Netbox uses fiber_cond inside, which throws an irrelevant usage error
-        -- at negative timeout. Need to check the case manually.
-        if timeout < 0 then
-            return nil, lerror.timeout()
-        end
-        ok, res = pcall(conn.wait_connected, conn, timeout)
-        if not ok then
-            return nil, lerror.make(res)
-        end
-        if not res then
-            return nil, lerror.timeout()
-        end
-        timeout = deadline - fiber_clock()
-    end
-    assert(false)
+    return netbox_wait_connected(replicaset_connect_master(replicaset), timeout)
 end
 
 --
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Tarantool-patches] [PATCH vshard 2/6] test: sort some table prints
  2021-07-01 22:09 [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 1/6] replicaset: introduce netbox_wait_connected() Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-01 22:09 ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-02 11:46   ` Oleg Babin via Tarantool-patches
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 3/6] storage: introduce vshard.storage._call('info') Vladislav Shpilevoy via Tarantool-patches
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-01 22:09 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Router.test.lua and upgrade.test.lua printed some Lua tables which
were not sorted. Next patches are going to alter them slightly and
if the tables are not sorted, some old keys would change place and
increase the diff.

The patch makes these outputs sorted so any amendment won't touch
the existing lines.
---
 test/router/router.result   | 28 +++++++++++++++++-----------
 test/router/router.test.lua |  2 ++
 test/upgrade/upgrade.result | 14 +++++++-------
 vshard/storage/init.lua     |  3 ++-
 4 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/test/router/router.result b/test/router/router.result
index f9ee37c..98dd5b5 100644
--- a/test/router/router.result
+++ b/test/router/router.result
@@ -1158,21 +1158,24 @@ end;
 _ = test_run:cmd("setopt delimiter ''");
 ---
 ...
+table.sort(error_messages)
+---
+...
 error_messages
 ---
-- - Use replicaset:callro(...) instead of replicaset.callro(...)
-  - Use replicaset:connect_master(...) instead of replicaset.connect_master(...)
+- - Use replicaset:call(...) instead of replicaset.call(...)
+  - Use replicaset:callbre(...) instead of replicaset.callbre(...)
+  - Use replicaset:callbro(...) instead of replicaset.callbro(...)
   - Use replicaset:callre(...) instead of replicaset.callre(...)
-  - Use replicaset:down_replica_priority(...) instead of replicaset.down_replica_priority(...)
+  - Use replicaset:callro(...) instead of replicaset.callro(...)
+  - Use replicaset:callrw(...) instead of replicaset.callrw(...)
   - Use replicaset:connect(...) instead of replicaset.connect(...)
-  - Use replicaset:wait_connected(...) instead of replicaset.wait_connected(...)
-  - Use replicaset:up_replica_priority(...) instead of replicaset.up_replica_priority(...)
-  - Use replicaset:callbro(...) instead of replicaset.callbro(...)
   - Use replicaset:connect_all(...) instead of replicaset.connect_all(...)
+  - Use replicaset:connect_master(...) instead of replicaset.connect_master(...)
   - Use replicaset:connect_replica(...) instead of replicaset.connect_replica(...)
-  - Use replicaset:call(...) instead of replicaset.call(...)
-  - Use replicaset:callrw(...) instead of replicaset.callrw(...)
-  - Use replicaset:callbre(...) instead of replicaset.callbre(...)
+  - Use replicaset:down_replica_priority(...) instead of replicaset.down_replica_priority(...)
+  - Use replicaset:up_replica_priority(...) instead of replicaset.up_replica_priority(...)
+  - Use replicaset:wait_connected(...) instead of replicaset.wait_connected(...)
 ...
 _, replica = next(replicaset.replicas)
 ---
@@ -1192,11 +1195,14 @@ end;
 _ = test_run:cmd("setopt delimiter ''");
 ---
 ...
+table.sort(error_messages)
+---
+...
 error_messages
 ---
-- - Use replica:safe_uri(...) instead of replica.safe_uri(...)
-  - Use replica:detach_conn(...) instead of replica.detach_conn(...)
+- - Use replica:detach_conn(...) instead of replica.detach_conn(...)
   - Use replica:is_connected(...) instead of replica.is_connected(...)
+  - Use replica:safe_uri(...) instead of replica.safe_uri(...)
 ...
 --
 -- gh-117: Preserve route_map on router.cfg.
diff --git a/test/router/router.test.lua b/test/router/router.test.lua
index aa3eb3b..0017111 100644
--- a/test/router/router.test.lua
+++ b/test/router/router.test.lua
@@ -413,6 +413,7 @@ for _, func in pairs(getmetatable(replicaset).__index) do
     table.insert(error_messages, msg:match("Use .*"))
 end;
 _ = test_run:cmd("setopt delimiter ''");
+table.sort(error_messages)
 error_messages
 
 _, replica = next(replicaset.replicas)
@@ -424,6 +425,7 @@ for _, func in pairs(getmetatable(replica).__index) do
     table.insert(error_messages, msg:match("Use .*"))
 end;
 _ = test_run:cmd("setopt delimiter ''");
+table.sort(error_messages)
 error_messages
 
 --
diff --git a/test/upgrade/upgrade.result b/test/upgrade/upgrade.result
index 833da3f..8280e5b 100644
--- a/test/upgrade/upgrade.result
+++ b/test/upgrade/upgrade.result
@@ -161,13 +161,13 @@ vshard.storage._call ~= nil
  | ...
 vshard.storage._call('test_api', 1, 2, 3)
  | ---
- | - bucket_recv: true
- |   storage_ref: true
- |   rebalancer_apply_routes: true
- |   storage_map: true
- |   rebalancer_request_state: true
- |   test_api: true
- |   storage_unref: true
+ | - - bucket_recv
+ |   - rebalancer_apply_routes
+ |   - rebalancer_request_state
+ |   - storage_map
+ |   - storage_ref
+ |   - storage_unref
+ |   - test_api
  | - 1
  | - 2
  | - 3
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 14ec42b..e13a24e 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -2566,8 +2566,9 @@ service_call_api = setmetatable({
 }, {__serialize = function(api)
     local res = {}
     for k, _ in pairs(api) do
-        res[k] = true
+        table.insert(res, k)
     end
+    table.sort(res)
     return res
 end})
 
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Tarantool-patches] [PATCH vshard 3/6] storage: introduce vshard.storage._call('info')
  2021-07-01 22:09 [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 1/6] replicaset: introduce netbox_wait_connected() Vladislav Shpilevoy via Tarantool-patches
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 2/6] test: sort some table prints Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-01 22:09 ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-02 11:46   ` Oleg Babin via Tarantool-patches
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 4/6] config: introduce master 'auto' replicaset option Vladislav Shpilevoy via Tarantool-patches
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-01 22:09 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Internal info function returns data needed by the routers. Now it
is only is_master flag. This is required for master discovery.

In future the function is supposed to get more attributes. For instance,
instance's own part of its config, bucket generation or bucket sequence number.

Needed for #75
---
 test/storage/storage.result   | 14 ++++++++++++++
 test/storage/storage.test.lua |  7 +++++++
 test/upgrade/upgrade.result   |  1 +
 vshard/storage/init.lua       |  7 +++++++
 4 files changed, 29 insertions(+)

diff --git a/test/storage/storage.result b/test/storage/storage.result
index 5372059..acae98f 100644
--- a/test/storage/storage.result
+++ b/test/storage/storage.result
@@ -938,6 +938,20 @@ assert(lstorage.bucket_are_all_rw())
 vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
 ---
 ...
+--
+-- Internal info function.
+--
+vshard.storage._call('info')
+---
+- is_master: true
+...
+_ = test_run:switch('storage_1_b')
+---
+...
+vshard.storage._call('info')
+---
+- is_master: false
+...
 _ = test_run:switch("default")
 ---
 ...
diff --git a/test/storage/storage.test.lua b/test/storage/storage.test.lua
index d1f3f50..99ef2c4 100644
--- a/test/storage/storage.test.lua
+++ b/test/storage/storage.test.lua
@@ -292,6 +292,13 @@ box.space._bucket:update(10, {{'=', 2, vshard.consts.BUCKET.ACTIVE}})
 assert(lstorage.bucket_are_all_rw())
 vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
 
+--
+-- Internal info function.
+--
+vshard.storage._call('info')
+_ = test_run:switch('storage_1_b')
+vshard.storage._call('info')
+
 _ = test_run:switch("default")
 test_run:drop_cluster(REPLICASET_2)
 test_run:drop_cluster(REPLICASET_1)
diff --git a/test/upgrade/upgrade.result b/test/upgrade/upgrade.result
index 8280e5b..8e59807 100644
--- a/test/upgrade/upgrade.result
+++ b/test/upgrade/upgrade.result
@@ -162,6 +162,7 @@ vshard.storage._call ~= nil
 vshard.storage._call('test_api', 1, 2, 3)
  | ---
  | - - bucket_recv
+ |   - info
  |   - rebalancer_apply_routes
  |   - rebalancer_request_state
  |   - storage_map
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index e13a24e..465abc5 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -2549,6 +2549,12 @@ local function storage_map(rid, name, args)
     return true, res
 end
 
+local function storage_service_info()
+    return {
+        is_master = this_is_master(),
+    }
+end
+
 local service_call_api
 
 local function service_call_test_api(...)
@@ -2562,6 +2568,7 @@ service_call_api = setmetatable({
     storage_ref = storage_ref,
     storage_unref = storage_unref,
     storage_map = storage_map,
+    info = storage_service_info,
     test_api = service_call_test_api,
 }, {__serialize = function(api)
     local res = {}
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Tarantool-patches] [PATCH vshard 4/6] config: introduce master 'auto' replicaset option
  2021-07-01 22:09 [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches
                   ` (2 preceding siblings ...)
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 3/6] storage: introduce vshard.storage._call('info') Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-01 22:09 ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-02 11:47   ` Oleg Babin via Tarantool-patches
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery Vladislav Shpilevoy via Tarantool-patches
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-01 22:09 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

The new option will enable auto-search for master on the marked
replicasets.

Part of #75
---
 test/unit/config.result   | 48 +++++++++++++++++++++++++++++++++++++++
 test/unit/config.test.lua | 21 +++++++++++++++++
 vshard/cfg.lua            | 27 +++++++++++++++++++---
 3 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/test/unit/config.result b/test/unit/config.result
index 9df3bf1..676ee4d 100644
--- a/test/unit/config.result
+++ b/test/unit/config.result
@@ -656,3 +656,51 @@ util.check_error(lcfg.check, cfg)
 cfg.sched_move_quota = nil
 ---
 ...
+--
+-- gh-75: auto master discovery.
+--
+replicaset = {replicas = {uuid = replica}}
+---
+...
+replicaset.master = 'auto'
+---
+...
+cfg.sharding = {rsid = replicaset}
+---
+...
+_ = lcfg.check(cfg)
+---
+...
+replicaset.master = 'non-auto'
+---
+...
+util.check_error(lcfg.check, cfg)
+---
+- Only "auto" master is supported
+...
+replicaset.master = 123
+---
+...
+util.check_error(lcfg.check, cfg)
+---
+- Master search mode must be string
+...
+replica.master = true
+---
+...
+replicaset.master = 'auto'
+---
+...
+util.check_error(lcfg.check, cfg)
+---
+- Can not specify master nodes when master search is enabled, but found master flag
+  in replica uuid uuid
+...
+replica.master = false
+---
+...
+util.check_error(lcfg.check, cfg)
+---
+- Can not specify master nodes when master search is enabled, but found master flag
+  in replica uuid uuid
+...
diff --git a/test/unit/config.test.lua b/test/unit/config.test.lua
index 473e460..4dfbd77 100644
--- a/test/unit/config.test.lua
+++ b/test/unit/config.test.lua
@@ -264,3 +264,24 @@ _ = lcfg.check(cfg)
 cfg.sched_move_quota = -1
 util.check_error(lcfg.check, cfg)
 cfg.sched_move_quota = nil
+
+--
+-- gh-75: auto master discovery.
+--
+replicaset = {replicas = {uuid = replica}}
+replicaset.master = 'auto'
+cfg.sharding = {rsid = replicaset}
+_ = lcfg.check(cfg)
+
+replicaset.master = 'non-auto'
+util.check_error(lcfg.check, cfg)
+
+replicaset.master = 123
+util.check_error(lcfg.check, cfg)
+
+replica.master = true
+replicaset.master = 'auto'
+util.check_error(lcfg.check, cfg)
+
+replica.master = false
+util.check_error(lcfg.check, cfg)
diff --git a/vshard/cfg.lua b/vshard/cfg.lua
index 30f8794..401d969 100644
--- a/vshard/cfg.lua
+++ b/vshard/cfg.lua
@@ -2,6 +2,7 @@
 
 local log = require('log')
 local luri = require('uri')
+local lutil = require('vshard.util')
 local consts = require('vshard.consts')
 
 local function check_uri(uri)
@@ -10,7 +11,7 @@ local function check_uri(uri)
     end
 end
 
-local function check_master(master, ctx)
+local function check_replica_master(master, ctx)
     if master then
         if ctx.master then
             error('Only one master is allowed per replicaset')
@@ -20,6 +21,15 @@ local function check_master(master, ctx)
     end
 end
 
+local function check_replicaset_master(master, ctx)
+    if not lutil.version_is_at_least(1, 10, 0) then
+        error('Only versions >= 1.10 support master discovery')
+    end
+    if master ~= 'auto' then
+        error('Only "auto" master is supported')
+    end
+end
+
 local function is_number(v)
     return type(v) == 'number' and v == v
 end
@@ -111,8 +121,8 @@ local replica_template = {
     name = {type = 'string', name = "Name", is_optional = true},
     zone = {type = {'string', 'number'}, name = "Zone", is_optional = true},
     master = {
-        type = 'boolean', name = "Master", is_optional = true, default = false,
-        check = check_master
+        type = 'boolean', name = "Master", is_optional = true,
+        check = check_replica_master
     },
 }
 
@@ -130,6 +140,10 @@ local replicaset_template = {
         default = 1,
     },
     lock = {type = 'boolean', name = 'Lock', is_optional = true},
+    master = {
+        type = 'string', name = 'Master search mode', is_optional = true,
+        check = check_replicaset_master
+    },
 }
 
 --
@@ -185,6 +199,7 @@ local function check_sharding(sharding)
             error('Replicaset weight can not be Inf')
         end
         validate_config(replicaset, replicaset_template)
+        local is_auto_master = replicaset.master == 'auto'
         for replica_uuid, replica in pairs(replicaset.replicas) do
             if uris[replica.uri] then
                 error(string.format('Duplicate uri %s', replica.uri))
@@ -194,6 +209,12 @@ local function check_sharding(sharding)
                 error(string.format('Duplicate uuid %s', replica_uuid))
             end
             uuids[replica_uuid] = true
+            if is_auto_master and replica.master ~= nil then
+                error(string.format('Can not specify master nodes when '..
+                                    'master search is enabled, but found '..
+                                    'master flag in replica uuid %s',
+                                    replica_uuid))
+            end
             -- Log warning in case replica.name duplicate is
             -- found. Message appears once for each unique
             -- duplicate.
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery
  2021-07-01 22:09 [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches
                   ` (3 preceding siblings ...)
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 4/6] config: introduce master 'auto' replicaset option Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-01 22:09 ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-02 11:48   ` Oleg Babin via Tarantool-patches
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 6/6] router: update master using a hint from storage Vladislav Shpilevoy via Tarantool-patches
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-01 22:09 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Part of #75

@TarantoolBot document
Title: VShard router master discovery

Router used not to be able to find master nodes in the configured
replicasets on its own. It relied only on how they were specified
in the config.

This becomes a problem when master changes and the change is not
delivered to the router's config somewhy. For instance, the router
does not rely on a central config provider. Or it does rely, but
the provider can't deliver a new config due to any reason.

This is getting especially tricky with built-in automatic master
elections which are not supported by vshard yet, but they will be,
they won't depend on any config. Master role won't be pinned to
one node then.

Now there is a new feature to overcome the master search problem -
configurable automatic master discovery on the router.

Router goes to the replicasets, marked as having an auto master,
finds a master in them, and periodically checks if the master is
still a master.

When a master in a replicaset stops being a master, the router
walks all nodes of the replicaset and finds who is the new master.

To turn the feature on there is a new option in router's config:
`master = 'auto'`. It should be specified per-replicaset, and is
not compatible with specifying a master manually.

This is how a good config looks:
```
config = {
    sharding = {
        <replicaset uuid> = {
            master = 'auto',
            replicas = {...},
        },
        ...
    },
    ...
}
```

This is how a bad config looks:
```
config = {
    sharding = {
        <replicaset uuid> = {
            master = 'auto',
            replicas = {
                <replica uuid1> = {
                    master = true,
                    ...
                },
                <replica uuid2> = {
                    master = false,
                    ...
                },
            },
        },
        ...
    },
    ...
}
```
It will not work, because either `master = 'auto'` can be
specified, or the master is assigned manually. Not both at the
same time.
---
 test/router/master_discovery.result   | 514 ++++++++++++++++++++++++++
 test/router/master_discovery.test.lua | 248 +++++++++++++
 vshard/consts.lua                     |   5 +
 vshard/error.lua                      |   5 +
 vshard/replicaset.lua                 | 203 +++++++++-
 vshard/router/init.lua                |  98 ++++-
 6 files changed, 1051 insertions(+), 22 deletions(-)
 create mode 100644 test/router/master_discovery.result
 create mode 100644 test/router/master_discovery.test.lua

diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result
new file mode 100644
index 0000000..2fca1e4
--- /dev/null
+++ b/test/router/master_discovery.result
@@ -0,0 +1,514 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+REPLICASET_1 = { 'storage_1_a', 'storage_1_b' }
+ | ---
+ | ...
+REPLICASET_2 = { 'storage_2_a', 'storage_2_b' }
+ | ---
+ | ...
+test_run:create_cluster(REPLICASET_1, 'router')
+ | ---
+ | ...
+test_run:create_cluster(REPLICASET_2, 'router')
+ | ---
+ | ...
+util = require('util')
+ | ---
+ | ...
+util.wait_master(test_run, REPLICASET_1, 'storage_1_a')
+ | ---
+ | ...
+util.wait_master(test_run, REPLICASET_2, 'storage_2_a')
+ | ---
+ | ...
+util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memtx\')')
+ | ---
+ | ...
+util.push_rs_filters(test_run)
+ | ---
+ | ...
+_ = test_run:cmd("create server router_1 with script='router/router_1.lua'")
+ | ---
+ | ...
+_ = test_run:cmd("start server router_1")
+ | ---
+ | ...
+
+--
+-- gh-75: automatic master discovery on router.
+--
+
+_ = test_run:switch("router_1")
+ | ---
+ | ...
+util = require('util')
+ | ---
+ | ...
+vshard.router.bootstrap()
+ | ---
+ | - true
+ | ...
+
+for _, rs in pairs(cfg.sharding) do                                             \
+    for _, r in pairs(rs.replicas) do                                           \
+        r.master = nil                                                          \
+    end                                                                         \
+end                                                                             \
+
+function enable_auto_masters()                                                  \
+    for _, rs in pairs(cfg.sharding) do                                         \
+        rs.master = 'auto'                                                      \
+    end                                                                         \
+    vshard.router.cfg(cfg)                                                      \
+end
+ | ---
+ | ...
+
+function disable_auto_masters()                                                 \
+    for _, rs in pairs(cfg.sharding) do                                         \
+        rs.master = nil                                                         \
+    end                                                                         \
+    vshard.router.cfg(cfg)                                                      \
+end
+ | ---
+ | ...
+
+-- But do not forget the buckets. Otherwise bucket discovery will establish
+-- the connections instead of external requests.
+function forget_masters()                                                       \
+    disable_auto_masters()                                                      \
+    enable_auto_masters()                                                       \
+end
+ | ---
+ | ...
+
+function check_all_masters_found()                                              \
+    for _, rs in pairs(vshard.router.static.replicasets) do                     \
+        if not rs.master then                                                   \
+            vshard.router.static.master_search_fiber:wakeup()                   \
+            return false                                                        \
+        end                                                                     \
+    end                                                                         \
+    return true                                                                 \
+end
+ | ---
+ | ...
+
+function check_master_for_replicaset(rs_id, master_name)                        \
+    local rs_uuid = util.replicasets[rs_id]                                     \
+    local master_uuid = util.name_to_uuid[master_name]                          \
+    local master = vshard.router.static.replicasets[rs_uuid].master             \
+    if not master or master.uuid ~= master_uuid then                            \
+        vshard.router.static.master_search_fiber:wakeup()                       \
+        return false                                                            \
+    end                                                                         \
+    return true                                                                 \
+end
+ | ---
+ | ...
+
+function check_all_buckets_found()                                              \
+    if vshard.router.info().bucket.unknown == 0 then                            \
+        return true                                                             \
+    end                                                                         \
+    vshard.router.discovery_wakeup()                                            \
+    return false                                                                \
+end
+ | ---
+ | ...
+
+master_search_helper_f = nil
+ | ---
+ | ...
+function aggressive_master_search_f()                                           \
+    while true do                                                               \
+        vshard.router.static.master_search_fiber:wakeup()                       \
+        fiber.sleep(0.001)                                                      \
+    end                                                                         \
+end
+ | ---
+ | ...
+
+function start_aggressive_master_search()                                       \
+    assert(master_search_helper_f == nil)                                       \
+    master_search_helper_f = fiber.new(aggressive_master_search_f)              \
+    master_search_helper_f:set_joinable(true)                                   \
+end
+ | ---
+ | ...
+
+function stop_aggressive_master_search()                                        \
+    assert(master_search_helper_f ~= nil)                                       \
+    master_search_helper_f:cancel()                                             \
+    master_search_helper_f:join()                                               \
+    master_search_helper_f = nil                                                \
+end
+ | ---
+ | ...
+
+--
+-- Simulate the first cfg when no masters are known.
+--
+forget_masters()
+ | ---
+ | ...
+assert(vshard.router.static.master_search_fiber ~= nil)
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(check_all_masters_found)
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(check_all_buckets_found)
+ | ---
+ | - true
+ | ...
+
+--
+-- Change master and see how router finds it again.
+--
+test_run:switch('storage_1_a')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = false
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_b].master = true
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('storage_1_b')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = false
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_b].master = true
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('router_1')
+ | ---
+ | - true
+ | ...
+big_timeout = 1000000
+ | ---
+ | ...
+opts_big_timeout = {timeout = big_timeout}
+ | ---
+ | ...
+test_run:wait_cond(function()                                                   \
+    return check_master_for_replicaset(1, 'storage_1_b')                        \
+end)
+ | ---
+ | - true
+ | ...
+vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)
+ | ---
+ | - 1
+ | ...
+
+test_run:switch('storage_1_b')
+ | ---
+ | - true
+ | ...
+assert(echo_count == 1)
+ | ---
+ | - true
+ | ...
+echo_count = 0
+ | ---
+ | ...
+
+--
+-- Revert the master back.
+--
+test_run:switch('storage_1_a')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = true
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_b].master = false
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('storage_1_b')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = true
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_b].master = false
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('router_1')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function()                                                   \
+    return check_master_for_replicaset(1, 'storage_1_a')                        \
+end)
+ | ---
+ | - true
+ | ...
+
+--
+-- Call tries to wait for master if has enough time left.
+--
+start_aggressive_master_search()
+ | ---
+ | ...
+
+forget_masters()
+ | ---
+ | ...
+vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)
+ | ---
+ | - 1
+ | ...
+
+forget_masters()
+ | ---
+ | ...
+-- XXX: this should not depend on master so much. RO requests should be able to
+-- go to replicas.
+vshard.router.callro(1501, 'echo', {1}, opts_big_timeout)
+ | ---
+ | - 1
+ | ...
+
+forget_masters()
+ | ---
+ | ...
+vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout)
+ | ---
+ | - 1
+ | - null
+ | - null
+ | ...
+
+forget_masters()
+ | ---
+ | ...
+-- XXX: the same as above - should not really wait for master. Regardless of it
+-- being auto or not.
+vshard.router.route(1501):callro('echo', {1}, opts_big_timeout)
+ | ---
+ | - 1
+ | - null
+ | - null
+ | ...
+
+stop_aggressive_master_search()
+ | ---
+ | ...
+
+test_run:switch('storage_1_a')
+ | ---
+ | - true
+ | ...
+assert(echo_count == 4)
+ | ---
+ | - true
+ | ...
+echo_count = 0
+ | ---
+ | ...
+
+--
+-- Old replicaset objects stop waiting for master when search is disabled.
+--
+
+-- Turn off masters on the first replicaset.
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = false
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('storage_1_b')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = false
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+-- Try to make RW and RO requests but then turn of the auto search.
+test_run:switch('router_1')
+ | ---
+ | - true
+ | ...
+forget_masters()
+ | ---
+ | ...
+f1 = fiber.create(function()                                                    \
+    fiber.self():set_joinable(true)                                             \
+    return vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)            \
+end)
+ | ---
+ | ...
+-- XXX: should not really wait for master since this is an RO request. It could
+-- use a replica.
+f2 = fiber.create(function()                                                    \
+    fiber.self():set_joinable(true)                                             \
+    return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout)            \
+end)
+ | ---
+ | ...
+fiber.sleep(0.01)
+ | ---
+ | ...
+disable_auto_masters()
+ | ---
+ | ...
+f1:join()
+ | ---
+ | - true
+ | - null
+ | - code: 6
+ |   type: ShardingError
+ |   name: MISSING_MASTER
+ |   replicaset_uuid: <replicaset_1>
+ |   message: Master is not configured for replicaset <replicaset_1>
+ | ...
+f2:join()
+ | ---
+ | - true
+ | - null
+ | - code: 6
+ |   type: ShardingError
+ |   name: MISSING_MASTER
+ |   replicaset_uuid: <replicaset_1>
+ |   message: Master is not configured for replicaset <replicaset_1>
+ | ...
+
+--
+-- Multiple masters logging.
+--
+test_run:switch('storage_1_a')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = true
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('storage_1_b')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_b].master = true
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('router_1')
+ | ---
+ | - true
+ | ...
+forget_masters()
+ | ---
+ | ...
+start_aggressive_master_search()
+ | ---
+ | ...
+test_run:wait_log('router_1', 'Found more than one master', nil, 10)
+ | ---
+ | - Found more than one master
+ | ...
+stop_aggressive_master_search()
+ | ---
+ | ...
+
+--
+-- Async request won't wait for master. Otherwise it would need to wait, which
+-- is not async behaviour. The timeout should be ignored.
+--
+do                                                                              \
+    forget_masters()                                                            \
+    return vshard.router.callrw(1501, 'echo', {1}, {                            \
+        is_async = true, timeout = big_timeout                                  \
+    })                                                                          \
+end
+ | ---
+ | - null
+ | - code: 6
+ |   type: ShardingError
+ |   name: MISSING_MASTER
+ |   replicaset_uuid: <replicaset_1>
+ |   message: Master is not configured for replicaset <replicaset_1>
+ | ...
+
+_ = test_run:switch("default")
+ | ---
+ | ...
+_ = test_run:cmd("stop server router_1")
+ | ---
+ | ...
+_ = test_run:cmd("cleanup server router_1")
+ | ---
+ | ...
+test_run:drop_cluster(REPLICASET_1)
+ | ---
+ | ...
+test_run:drop_cluster(REPLICASET_2)
+ | ---
+ | ...
+_ = test_run:cmd('clear filter')
+ | ---
+ | ...
diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua
new file mode 100644
index 0000000..e087f58
--- /dev/null
+++ b/test/router/master_discovery.test.lua
@@ -0,0 +1,248 @@
+test_run = require('test_run').new()
+REPLICASET_1 = { 'storage_1_a', 'storage_1_b' }
+REPLICASET_2 = { 'storage_2_a', 'storage_2_b' }
+test_run:create_cluster(REPLICASET_1, 'router')
+test_run:create_cluster(REPLICASET_2, 'router')
+util = require('util')
+util.wait_master(test_run, REPLICASET_1, 'storage_1_a')
+util.wait_master(test_run, REPLICASET_2, 'storage_2_a')
+util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memtx\')')
+util.push_rs_filters(test_run)
+_ = test_run:cmd("create server router_1 with script='router/router_1.lua'")
+_ = test_run:cmd("start server router_1")
+
+--
+-- gh-75: automatic master discovery on router.
+--
+
+_ = test_run:switch("router_1")
+util = require('util')
+vshard.router.bootstrap()
+
+for _, rs in pairs(cfg.sharding) do                                             \
+    for _, r in pairs(rs.replicas) do                                           \
+        r.master = nil                                                          \
+    end                                                                         \
+end                                                                             \
+
+function enable_auto_masters()                                                  \
+    for _, rs in pairs(cfg.sharding) do                                         \
+        rs.master = 'auto'                                                      \
+    end                                                                         \
+    vshard.router.cfg(cfg)                                                      \
+end
+
+function disable_auto_masters()                                                 \
+    for _, rs in pairs(cfg.sharding) do                                         \
+        rs.master = nil                                                         \
+    end                                                                         \
+    vshard.router.cfg(cfg)                                                      \
+end
+
+-- But do not forget the buckets. Otherwise bucket discovery will establish
+-- the connections instead of external requests.
+function forget_masters()                                                       \
+    disable_auto_masters()                                                      \
+    enable_auto_masters()                                                       \
+end
+
+function check_all_masters_found()                                              \
+    for _, rs in pairs(vshard.router.static.replicasets) do                     \
+        if not rs.master then                                                   \
+            vshard.router.static.master_search_fiber:wakeup()                   \
+            return false                                                        \
+        end                                                                     \
+    end                                                                         \
+    return true                                                                 \
+end
+
+function check_master_for_replicaset(rs_id, master_name)                        \
+    local rs_uuid = util.replicasets[rs_id]                                     \
+    local master_uuid = util.name_to_uuid[master_name]                          \
+    local master = vshard.router.static.replicasets[rs_uuid].master             \
+    if not master or master.uuid ~= master_uuid then                            \
+        vshard.router.static.master_search_fiber:wakeup()                       \
+        return false                                                            \
+    end                                                                         \
+    return true                                                                 \
+end
+
+function check_all_buckets_found()                                              \
+    if vshard.router.info().bucket.unknown == 0 then                            \
+        return true                                                             \
+    end                                                                         \
+    vshard.router.discovery_wakeup()                                            \
+    return false                                                                \
+end
+
+master_search_helper_f = nil
+function aggressive_master_search_f()                                           \
+    while true do                                                               \
+        vshard.router.static.master_search_fiber:wakeup()                       \
+        fiber.sleep(0.001)                                                      \
+    end                                                                         \
+end
+
+function start_aggressive_master_search()                                       \
+    assert(master_search_helper_f == nil)                                       \
+    master_search_helper_f = fiber.new(aggressive_master_search_f)              \
+    master_search_helper_f:set_joinable(true)                                   \
+end
+
+function stop_aggressive_master_search()                                        \
+    assert(master_search_helper_f ~= nil)                                       \
+    master_search_helper_f:cancel()                                             \
+    master_search_helper_f:join()                                               \
+    master_search_helper_f = nil                                                \
+end
+
+--
+-- Simulate the first cfg when no masters are known.
+--
+forget_masters()
+assert(vshard.router.static.master_search_fiber ~= nil)
+test_run:wait_cond(check_all_masters_found)
+test_run:wait_cond(check_all_buckets_found)
+
+--
+-- Change master and see how router finds it again.
+--
+test_run:switch('storage_1_a')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = false
+replicas[util.name_to_uuid.storage_1_b].master = true
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('storage_1_b')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = false
+replicas[util.name_to_uuid.storage_1_b].master = true
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('router_1')
+big_timeout = 1000000
+opts_big_timeout = {timeout = big_timeout}
+test_run:wait_cond(function()                                                   \
+    return check_master_for_replicaset(1, 'storage_1_b')                        \
+end)
+vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)
+
+test_run:switch('storage_1_b')
+assert(echo_count == 1)
+echo_count = 0
+
+--
+-- Revert the master back.
+--
+test_run:switch('storage_1_a')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = true
+replicas[util.name_to_uuid.storage_1_b].master = false
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('storage_1_b')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = true
+replicas[util.name_to_uuid.storage_1_b].master = false
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('router_1')
+test_run:wait_cond(function()                                                   \
+    return check_master_for_replicaset(1, 'storage_1_a')                        \
+end)
+
+--
+-- Call tries to wait for master if has enough time left.
+--
+start_aggressive_master_search()
+
+forget_masters()
+vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)
+
+forget_masters()
+-- XXX: this should not depend on master so much. RO requests should be able to
+-- go to replicas.
+vshard.router.callro(1501, 'echo', {1}, opts_big_timeout)
+
+forget_masters()
+vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout)
+
+forget_masters()
+-- XXX: the same as above - should not really wait for master. Regardless of it
+-- being auto or not.
+vshard.router.route(1501):callro('echo', {1}, opts_big_timeout)
+
+stop_aggressive_master_search()
+
+test_run:switch('storage_1_a')
+assert(echo_count == 4)
+echo_count = 0
+
+--
+-- Old replicaset objects stop waiting for master when search is disabled.
+--
+
+-- Turn off masters on the first replicaset.
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = false
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('storage_1_b')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = false
+vshard.storage.cfg(cfg, instance_uuid)
+
+-- Try to make RW and RO requests but then turn of the auto search.
+test_run:switch('router_1')
+forget_masters()
+f1 = fiber.create(function()                                                    \
+    fiber.self():set_joinable(true)                                             \
+    return vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)            \
+end)
+-- XXX: should not really wait for master since this is an RO request. It could
+-- use a replica.
+f2 = fiber.create(function()                                                    \
+    fiber.self():set_joinable(true)                                             \
+    return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout)            \
+end)
+fiber.sleep(0.01)
+disable_auto_masters()
+f1:join()
+f2:join()
+
+--
+-- Multiple masters logging.
+--
+test_run:switch('storage_1_a')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = true
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('storage_1_b')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_b].master = true
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('router_1')
+forget_masters()
+start_aggressive_master_search()
+test_run:wait_log('router_1', 'Found more than one master', nil, 10)
+stop_aggressive_master_search()
+
+--
+-- Async request won't wait for master. Otherwise it would need to wait, which
+-- is not async behaviour. The timeout should be ignored.
+--
+do                                                                              \
+    forget_masters()                                                            \
+    return vshard.router.callrw(1501, 'echo', {1}, {                            \
+        is_async = true, timeout = big_timeout                                  \
+    })                                                                          \
+end
+
+_ = test_run:switch("default")
+_ = test_run:cmd("stop server router_1")
+_ = test_run:cmd("cleanup server router_1")
+test_run:drop_cluster(REPLICASET_1)
+test_run:drop_cluster(REPLICASET_2)
+_ = test_run:cmd('clear filter')
diff --git a/vshard/consts.lua b/vshard/consts.lua
index 47a893b..66a09ae 100644
--- a/vshard/consts.lua
+++ b/vshard/consts.lua
@@ -52,6 +52,11 @@ return {
     DISCOVERY_WORK_STEP = 0.01,
     DISCOVERY_TIMEOUT = 10,
 
+    MASTER_SEARCH_IDLE_INTERVAL = 5,
+    MASTER_SEARCH_WORK_INTERVAL = 0.5,
+    MASTER_SEARCH_BACKOFF_INTERVAL = 5,
+    MASTER_SEARCH_TIEMOUT = 5,
+
     TIMEOUT_INFINITY = 500 * 365 * 86400,
     DEADLINE_INFINITY = math.huge,
 }
diff --git a/vshard/error.lua b/vshard/error.lua
index bcbcd71..e2d1a31 100644
--- a/vshard/error.lua
+++ b/vshard/error.lua
@@ -153,6 +153,11 @@ local error_message_template = {
         name = 'BUCKET_RECV_DATA_ERROR',
         msg = 'Can not receive the bucket %s data in space "%s" at tuple %s: %s',
         args = {'bucket_id', 'space', 'tuple', 'reason'},
+    },
+    [31] = {
+        name = 'MULTIPLE_MASTERS_FOUND',
+        msg = 'Found more than one master in replicaset %s on nodes %s and %s',
+        args = {'replicaset_uuid', 'master1', 'master2'},
     }
 }
 
diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
index fa048c9..7747258 100644
--- a/vshard/replicaset.lua
+++ b/vshard/replicaset.lua
@@ -25,6 +25,10 @@
 --          }
 --      },
 --      master = <master server from the array above>,
+--      master_cond = <condition variable signaled when the replicaset finds or
+--                     changes its master>,
+--      is_auto_master = <true when is configured to find the master on
+--                        its own>,
 --      replica = <nearest available replica object>,
 --      balance_i = <index of a next replica in priority_list to
 --                   use for a load-balanced request>,
@@ -55,6 +59,8 @@ local luuid = require('uuid')
 local ffi = require('ffi')
 local util = require('vshard.util')
 local fiber_clock = fiber.clock
+local fiber_yield = fiber.yield
+local fiber_cond_wait = util.fiber_cond_wait
 local gsc = util.generate_self_checker
 
 --
@@ -159,6 +165,26 @@ local function replicaset_connect_to_replica(replicaset, replica)
     return conn
 end
 
+local function replicaset_wait_master(replicaset, timeout)
+    local master = replicaset.master
+    -- Fast path - master is almost always known.
+    if master then
+        return master, timeout
+    end
+    -- Slow path.
+    local deadline = fiber_clock() + timeout
+    repeat
+        if not replicaset.is_auto_master or
+           not fiber_cond_wait(replicaset.master_cond, timeout) then
+            return nil, lerror.vshard(lerror.code.MISSING_MASTER,
+                                      replicaset.uuid)
+        end
+        timeout = deadline - fiber_clock()
+        master = replicaset.master
+    until master
+    return master, timeout
+end
+
 --
 -- Create net.box connection to master.
 --
@@ -175,7 +201,13 @@ end
 -- Wait until the master instance is connected.
 --
 local function replicaset_wait_connected(replicaset, timeout)
-    return netbox_wait_connected(replicaset_connect_master(replicaset), timeout)
+    local master
+    master, timeout = replicaset_wait_master(replicaset, timeout)
+    if not master then
+        return nil, timeout
+    end
+    local conn = replicaset_connect_to_replica(replicaset, master)
+    return netbox_wait_connected(conn, timeout)
 end
 
 --
@@ -345,18 +377,30 @@ local function replicaset_master_call(replicaset, func, args, opts)
     assert(opts == nil or type(opts) == 'table')
     assert(type(func) == 'string', 'function name')
     assert(args == nil or type(args) == 'table', 'function arguments')
-    local conn, err = replicaset_connect_master(replicaset)
-    if not conn then
-        return nil, err
-    end
-    if not opts then
-        opts = {timeout = replicaset.master.net_timeout}
-    elseif not opts.timeout then
-        opts = table.copy(opts)
-        opts.timeout = replicaset.master.net_timeout
+    local master = replicaset.master
+    if not master then
+        opts = opts and table.copy(opts) or {}
+        if opts.is_async then
+            return nil, lerror.vshard(lerror.code.MISSING_MASTER,
+                                      replicaset.uuid)
+        end
+        local timeout = opts.timeout or consts.MASTER_SEARCH_TIEMOUT
+        master, timeout = replicaset_wait_master(replicaset, timeout)
+        if not master then
+            return nil, timeout
+        end
+        opts.timeout = timeout
+    else
+        if not opts then
+            opts = {timeout = master.net_timeout}
+        elseif not opts.timeout then
+            opts = table.copy(opts)
+            opts.timeout = master.net_timeout
+        end
     end
+    replicaset_connect_to_replica(replicaset, master)
     local net_status, storage_status, retval, error_object =
-        replica_call(replicaset.master, func, args, opts)
+        replica_call(master, func, args, opts)
     -- Ignore net_status - master does not retry requests.
     return storage_status, retval, error_object
 end
@@ -425,11 +469,6 @@ local function replicaset_template_multicallro(prefer_replica, balance)
                 return r
             end
         end
-        local conn, err = replicaset_connect_master(replicaset)
-        if not conn then
-            return nil, err
-        end
-        return master
     end
 
     return function(replicaset, func, args, opts)
@@ -444,9 +483,13 @@ local function replicaset_template_multicallro(prefer_replica, balance)
         end
         local end_time = fiber_clock() + timeout
         while not net_status and timeout > 0 do
-            replica, err = pick_next_replica(replicaset)
+            replica = pick_next_replica(replicaset)
             if not replica then
-                return nil, err
+                replica, timeout = replicaset_wait_master(replicaset, timeout)
+                if not replica then
+                    return nil, timeout
+                end
+                replicaset_connect_to_replica(replicaset, replica)
             end
             opts.timeout = timeout
             net_status, storage_status, retval, err =
@@ -508,7 +551,128 @@ local function rebind_replicasets(replicasets, old_replicasets)
                 end
             end
         end
+        if old_replicaset then
+            -- Take a hint from the old replicaset who is the master now.
+            if replicaset.is_auto_master then
+                local master = old_replicaset.master
+                if master then
+                    replicaset.master = replicaset.replicas[master.uuid]
+                end
+            end
+            -- Stop waiting for master in the old replicaset. Its running
+            -- requests won't find it anyway. Auto search works only for the
+            -- most actual replicaset objects.
+            if old_replicaset.is_auto_master then
+                old_replicaset.is_auto_master = false
+                old_replicaset.master_cond:broadcast()
+            end
+        end
+    end
+end
+
+--
+-- Check if the master is still master, and find a new master if there is no a
+-- known one.
+--
+local function replicaset_locate_master(replicaset)
+    local is_done = true
+    local is_nop = true
+    if not replicaset.is_auto_master then
+        return is_done, is_nop
+    end
+    local func = 'vshard.storage._call'
+    local args = {'info'}
+    local const_timeout = consts.MASTER_SEARCH_TIEMOUT
+    local sync_opts = {timeout = const_timeout}
+    local ok, res, err, f
+    local master = replicaset.master
+    if master then
+        ok, res, err = replica_call(master, func, args, sync_opts)
+        if not ok then
+            return is_done, is_nop, err
+        end
+        if res.is_master then
+            return is_done, is_nop
+        end
+        log.info('Master of replicaset %s, node %s, has resigned. Trying '..
+                 'to find a new one', replicaset.uuid, master.uuid)
+        replicaset.master = nil
+    end
+    is_nop = false
+
+    local last_err
+    local futures = {}
+    local timeout = const_timeout
+    local deadline = fiber_clock() + timeout
+    local async_opts = {is_async = true}
+    local replicaset_uuid = replicaset.uuid
+    for replica_uuid, replica in pairs(replicaset.replicas) do
+        local conn = replica.conn
+        timeout, err = netbox_wait_connected(conn, timeout)
+        if not timeout then
+            last_err = err
+            timeout = deadline - fiber_clock()
+        else
+            ok, f = pcall(conn.call, conn, func, args, async_opts)
+            if not ok then
+                last_err = lerror.make(f)
+            else
+                futures[replica_uuid] = f
+            end
+        end
+    end
+    local master_uuid
+    for replica_uuid, f in pairs(futures) do
+        if timeout < 0 then
+            -- Netbox uses cond var inside, whose wait throws an error when gets
+            -- a negative timeout. Avoid that.
+            timeout = 0
+        end
+        res, err = f:wait_result(timeout)
+        timeout = deadline - fiber_clock()
+        if not res then
+            f:discard()
+            last_err = err
+            goto next_wait
+        end
+        res = res[1]
+        if not res.is_master then
+            goto next_wait
+        end
+        if not master_uuid then
+            master_uuid = replica_uuid
+            goto next_wait
+        end
+        is_done = false
+        last_err = lerror.vshard(lerror.code.MULTIPLE_MASTERS_FOUND,
+                                 replicaset_uuid, master_uuid, replica_uuid)
+        do return is_done, is_nop, last_err end
+        ::next_wait::
+    end
+    master = replicaset.replicas[master_uuid]
+    if master then
+        log.info('Found master for replicaset %s: %s', replicaset_uuid,
+                 master_uuid)
+        replicaset.master = master
+        replicaset.master_cond:broadcast()
+    else
+        is_done = false
+    end
+    return is_done, is_nop, last_err
+end
+
+local function locate_masters(replicasets)
+    local is_all_done = true
+    local is_all_nop = true
+    local last_err
+    for _, replicaset in pairs(replicasets) do
+        local is_done, is_nop, err = replicaset_locate_master(replicaset)
+        is_all_done = is_all_done and is_done
+        is_all_nop = is_all_nop and is_nop
+        last_err = err or last_err
+        fiber_yield()
     end
+    return is_all_done, is_all_nop, last_err
 end
 
 --
@@ -729,6 +893,8 @@ local function buildall(sharding_cfg)
             bucket_count = 0,
             lock = replicaset.lock,
             balance_i = 1,
+            is_auto_master = replicaset.master == 'auto',
+            master_cond = fiber.cond(),
         }, replicaset_mt)
         local priority_list = {}
         for replica_uuid, replica in pairs(replicaset.replicas) do
@@ -802,4 +968,5 @@ return {
     wait_masters_connect = wait_masters_connect,
     rebind_replicasets = rebind_replicasets,
     outdate_replicasets = outdate_replicasets,
+    locate_masters = locate_masters,
 }
diff --git a/vshard/router/init.lua b/vshard/router/init.lua
index 5e2a96b..9407ccd 100644
--- a/vshard/router/init.lua
+++ b/vshard/router/init.lua
@@ -68,6 +68,8 @@ local ROUTER_TEMPLATE = {
         replicasets = nil,
         -- Fiber to maintain replica connections.
         failover_fiber = nil,
+        -- Fiber to watch for master changes and find new masters.
+        master_search_fiber = nil,
         -- Fiber to discovery buckets in background.
         discovery_fiber = nil,
         -- How discovery works. On - work infinitely. Off - no
@@ -1030,6 +1032,93 @@ local function failover_f(router)
     end
 end
 
+--------------------------------------------------------------------------------
+-- Master search
+--------------------------------------------------------------------------------
+
+local function master_search_step(router)
+    local ok, is_done, is_nop, err = pcall(lreplicaset.locate_masters,
+                                           router.replicasets)
+    if not ok then
+        err = is_done
+        is_done = false
+        is_nop = false
+    end
+    return is_done, is_nop, err
+end
+
+--
+-- Master discovery background function. It is supposed to notice master changes
+-- and find new masters in the replicasets, which are configured for that.
+--
+-- XXX: due to polling the search might notice master change not right when it
+-- happens. In future it makes sense to rewrite master search using
+-- subscriptions. The problem is that at the moment of writing the subscriptions
+-- are not working well in all Tarantool versions.
+--
+local function master_search_f(router)
+    local module_version = M.module_version
+    local is_in_progress = false
+    while module_version == M.module_version do
+        local timeout
+        local start_time = fiber_clock()
+        local is_done, is_nop, err = master_search_step(router)
+        if err then
+            log.error('Error during master search: %s', lerror.make(err))
+        end
+        if is_done then
+            timeout = consts.MASTER_SEARCH_IDLE_INTERVAL
+        elseif err then
+            timeout = consts.MASTER_SEARCH_BACKOFF_INTERVAL
+        else
+            timeout = consts.MASTER_SEARCH_WORK_INTERVAL
+        end
+        if not is_in_progress then
+            if not is_nop and is_done then
+                log.info('Master search happened')
+            elseif not is_done then
+                log.info('Master search is started')
+                is_in_progress = true
+            end
+        elseif is_done then
+            log.info('Master search is finished')
+            is_in_progress = false
+        end
+        local end_time = fiber_clock()
+        local duration = end_time - start_time
+        if not is_nop then
+            log.verbose('Master search step took %s seconds. Next in %s '..
+                        'seconds', duration, timeout)
+        end
+        lfiber.sleep(timeout)
+    end
+end
+
+local function master_search_set(router)
+    local enable = false
+    for _, rs in pairs(router.replicasets) do
+        if rs.is_auto_master then
+            enable = true
+            break
+        end
+    end
+    local search_fiber = router.master_search_fiber
+    if enable and search_fiber == nil then
+        log.info('Master auto search is enabled')
+        router.master_search_fiber = util.reloadable_fiber_create(
+            'vshard.master_search.' .. router.name, M, 'master_search_f',
+            router)
+    elseif not enable and search_fiber ~= nil then
+        -- Do not make users pay for what they do not use - when the search is
+        -- disabled for all replicasets, there should not be any fiber.
+        log.info('Master auto search is disabled')
+        if search_fiber:status() ~= 'dead' then
+            search_fiber:cancel()
+        end
+        router.master_search_fiber = nil
+    end
+end
+
 --------------------------------------------------------------------------------
 -- Configuration
 --------------------------------------------------------------------------------
@@ -1100,6 +1189,7 @@ local function router_cfg(router, cfg, is_reload)
             'vshard.failover.' .. router.name, M, 'failover_f', router)
     end
     discovery_set(router, vshard_cfg.discovery_mode)
+    master_search_set(router)
 end
 
 --------------------------------------------------------------------------------
@@ -1535,6 +1625,10 @@ end
 --------------------------------------------------------------------------------
 -- Module definition
 --------------------------------------------------------------------------------
+M.discovery_f = discovery_f
+M.failover_f = failover_f
+M.master_search_f = master_search_f
+M.router_mt = router_mt
 --
 -- About functions, saved in M, and reloading see comment in
 -- storage/init.lua.
@@ -1556,10 +1650,6 @@ else
     M.module_version = M.module_version + 1
 end
 
-M.discovery_f = discovery_f
-M.failover_f = failover_f
-M.router_mt = router_mt
-
 module.cfg = legacy_cfg
 module.new = router_new
 module.internal = M
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Tarantool-patches] [PATCH vshard 6/6] router: update master using a hint from storage
  2021-07-01 22:09 [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches
                   ` (4 preceding siblings ...)
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-01 22:09 ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-02 11:49   ` Oleg Babin via Tarantool-patches
  2021-07-02 21:36 ` [Tarantool-patches] [PATCH vshard 7/6] util: truncate too long fiber name Vladislav Shpilevoy via Tarantool-patches
  2021-08-03 21:55 ` [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches
  7 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-01 22:09 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Storage sends NON_MASTER error when an attempt happens to make a
read-write request on it while it is not a master. The error
contains UUID of the instance.

The patch adds to this error a new field - UUID of the master as
it is seen on this storage. Router now uses that information to
quickly switch its read-write requests to the new master. In fact,
this should happen in almost all cases of master auto-discovery on
the router if it occurs under load.

Closes #75
---
 test/router/master_discovery.result   | 427 ++++++++++++++++++++++++++
 test/router/master_discovery.test.lua | 191 ++++++++++++
 test/router/router.result             |   8 +-
 vshard/error.lua                      |   2 +-
 vshard/replicaset.lua                 |  65 ++++
 vshard/router/init.lua                |  17 +-
 vshard/storage/init.lua               |   6 +-
 7 files changed, 706 insertions(+), 10 deletions(-)

diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result
index 2fca1e4..8afeccb 100644
--- a/test/router/master_discovery.result
+++ b/test/router/master_discovery.result
@@ -109,6 +109,18 @@ end
  | ---
  | ...
 
+function check_no_master_for_replicaset(rs_id)                                  \
+    local rs_uuid = util.replicasets[rs_id]                                     \
+    local master = vshard.router.static.replicasets[rs_uuid].master             \
+    if not master then                                                          \
+        return true                                                             \
+    end                                                                         \
+    vshard.router.static.master_search_fiber:wakeup()                           \
+    return false                                                                \
+end
+ | ---
+ | ...
+
 function check_all_buckets_found()                                              \
     if vshard.router.info().bucket.unknown == 0 then                            \
         return true                                                             \
@@ -148,6 +160,28 @@ end
  | ---
  | ...
 
+function master_discovery_block()                                               \
+    vshard.router.internal.errinj.ERRINJ_MASTER_SEARCH_DELAY = true             \
+end
+ | ---
+ | ...
+
+function check_master_discovery_block()                                         \
+    if vshard.router.internal.errinj.ERRINJ_MASTER_SEARCH_DELAY == 'in' then    \
+        return true                                                             \
+    end                                                                         \
+    vshard.router.static.master_search_fiber:wakeup()                           \
+    return false                                                                \
+end
+ | ---
+ | ...
+
+function master_discovery_unblock()                                             \
+    vshard.router.internal.errinj.ERRINJ_MASTER_SEARCH_DELAY = false            \
+end
+ | ---
+ | ...
+
 --
 -- Simulate the first cfg when no masters are known.
 --
@@ -494,6 +528,399 @@ end
  |   message: Master is not configured for replicaset <replicaset_1>
  | ...
 
+--
+-- Restore the old master back.
+--
+test_run:switch('storage_1_b')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_b].master = false
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = true
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+--
+-- RW call uses a hint from the old master about who is the new master.
+--
+test_run:switch('router_1')
+ | ---
+ | - true
+ | ...
+forget_masters()
+ | ---
+ | ...
+test_run:wait_cond(check_all_masters_found)
+ | ---
+ | - true
+ | ...
+master_discovery_block()
+ | ---
+ | ...
+test_run:wait_cond(check_master_discovery_block)
+ | ---
+ | - true
+ | ...
+
+-- Change master while discovery is asleep.
+test_run:switch('storage_1_a')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = false
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_b].master = true
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('storage_1_b')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = false
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_b].master = true
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+-- First request fails and tells where is the master. The second attempt works.
+test_run:switch('router_1')
+ | ---
+ | - true
+ | ...
+vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)
+ | ---
+ | - 1
+ | ...
+
+test_run:switch('storage_1_b')
+ | ---
+ | - true
+ | ...
+assert(echo_count == 1)
+ | ---
+ | - true
+ | ...
+echo_count = 0
+ | ---
+ | ...
+
+--
+-- A non master error might contain no information about a new master.
+--
+-- Make the replicaset read-only.
+test_run:switch('storage_1_a')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_b].master = false
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('storage_1_b')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_b].master = false
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('router_1')
+ | ---
+ | - true
+ | ...
+-- A request should return no info about a new master. The router will wait for
+-- a new master discovery.
+f = fiber.create(function()                                                     \
+    fiber.self():set_joinable(true)                                             \
+    return vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)            \
+end)
+ | ---
+ | ...
+test_run:wait_cond(function()                                                   \
+    return check_no_master_for_replicaset(1)                                    \
+end)
+ | ---
+ | - true
+ | ...
+
+test_run:switch('storage_1_a')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = true
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('storage_1_b')
+ | ---
+ | - true
+ | ...
+replicas = cfg.sharding[util.replicasets[1]].replicas
+ | ---
+ | ...
+replicas[util.name_to_uuid.storage_1_a].master = true
+ | ---
+ | ...
+vshard.storage.cfg(cfg, instance_uuid)
+ | ---
+ | ...
+
+test_run:switch('router_1')
+ | ---
+ | - true
+ | ...
+master_discovery_unblock()
+ | ---
+ | ...
+test_run:wait_cond(check_all_masters_found)
+ | ---
+ | - true
+ | ...
+f:join()
+ | ---
+ | - true
+ | - 1
+ | ...
+
+test_run:switch('storage_1_a')
+ | ---
+ | - true
+ | ...
+assert(echo_count == 1)
+ | ---
+ | - true
+ | ...
+echo_count = 0
+ | ---
+ | ...
+
+--
+-- Unit tests for master change in a replicaset object. Normally it can only
+-- happen in quite complicated cases. Hence the tests prefer to use the internal
+-- replicaset object instead.
+-- Disable the master search fiber so as it wouldn't interfere.
+--
+test_run:switch('router_1')
+ | ---
+ | - true
+ | ...
+master_discovery_block()
+ | ---
+ | ...
+test_run:wait_cond(check_master_discovery_block)
+ | ---
+ | - true
+ | ...
+rs = vshard.router.static.replicasets[util.replicasets[1]]
+ | ---
+ | ...
+storage_a_uuid = util.name_to_uuid.storage_1_a
+ | ---
+ | ...
+storage_b_uuid = util.name_to_uuid.storage_1_b
+ | ---
+ | ...
+
+assert(rs.master.uuid == storage_a_uuid)
+ | ---
+ | - true
+ | ...
+rs.master = nil
+ | ---
+ | ...
+rs.is_auto_master = false
+ | ---
+ | ...
+
+-- When auto-search is disabled and master is not known, nothing will make it
+-- known. It is up to the config.
+assert(not rs:update_master(storage_a_uuid, storage_b_uuid))
+ | ---
+ | - true
+ | ...
+assert(not rs.master)
+ | ---
+ | - true
+ | ...
+-- New master might be not reported.
+assert(not rs:update_master(storage_a_uuid))
+ | ---
+ | - true
+ | ...
+assert(not rs.master)
+ | ---
+ | - true
+ | ...
+
+-- With auto-search and not known master it is not assigned if a new master is
+-- not reported.
+rs.is_auto_master = true
+ | ---
+ | ...
+-- But update returns true, because it makes sense to try a next request later
+-- when the master is found.
+assert(rs:update_master(storage_a_uuid))
+ | ---
+ | - true
+ | ...
+assert(not rs.master)
+ | ---
+ | - true
+ | ...
+
+-- Report of a not known UUID won't assign the master.
+assert(rs:update_master(storage_a_uuid, util.name_to_uuid.storage_2_a))
+ | ---
+ | - true
+ | ...
+assert(not rs.master)
+ | ---
+ | - true
+ | ...
+
+-- Report of a known UUID assigns the master.
+assert(rs:update_master(storage_a_uuid, storage_b_uuid))
+ | ---
+ | - true
+ | ...
+assert(rs.master.uuid == storage_b_uuid)
+ | ---
+ | - true
+ | ...
+
+-- Master could change while the request's error was being received. Then the
+-- error should not change anything because it is outdated.
+assert(rs:update_master(storage_a_uuid))
+ | ---
+ | - true
+ | ...
+assert(rs.master.uuid == storage_b_uuid)
+ | ---
+ | - true
+ | ...
+-- It does not depend on auto-search. Still returns true, because if the master
+-- was changed since the request was sent, it means it could be retried and
+-- might succeed.
+rs.is_auto_master = false
+ | ---
+ | ...
+assert(rs:update_master(storage_a_uuid))
+ | ---
+ | - true
+ | ...
+assert(rs.master.uuid == storage_b_uuid)
+ | ---
+ | - true
+ | ...
+
+-- If the current master is reported as not a master and auto-search is
+-- disabled, update should fail. Because makes no sense to retry until a new
+-- config is applied externally.
+assert(not rs:update_master(storage_b_uuid, storage_a_uuid))
+ | ---
+ | - true
+ | ...
+assert(rs.master.uuid == storage_b_uuid)
+ | ---
+ | - true
+ | ...
+
+-- With auto-search, if the node is not a master and no new master is reported,
+-- the current master should be reset. Because makes no sense to send more RW
+-- requests to him. But update returns true, because the current request could
+-- be retried after waiting for a new master discovery.
+rs.is_auto_master = true
+ | ---
+ | ...
+assert(rs:update_master(storage_b_uuid))
+ | ---
+ | - true
+ | ...
+assert(rs.master == nil)
+ | ---
+ | - true
+ | ...
+
+-- When candidate is reported, and is known, it is used. But restore the master
+-- first to test its change.
+assert(rs:update_master(storage_b_uuid, storage_a_uuid))
+ | ---
+ | - true
+ | ...
+assert(rs.master.uuid == storage_a_uuid)
+ | ---
+ | - true
+ | ...
+-- Now update.
+assert(rs:update_master(storage_a_uuid, storage_b_uuid))
+ | ---
+ | - true
+ | ...
+assert(rs.master.uuid == storage_b_uuid)
+ | ---
+ | - true
+ | ...
+
+-- Candidate UUID might be not known in case the topology config is different on
+-- the router and on the storage. Then the master is simply reset.
+assert(rs:update_master(storage_b_uuid, util.name_to_uuid.storage_2_a))
+ | ---
+ | - true
+ | ...
+assert(rs.master == nil)
+ | ---
+ | - true
+ | ...
+
+master_discovery_unblock()
+ | ---
+ | ...
+test_run:wait_cond(check_all_masters_found)
+ | ---
+ | - true
+ | ...
+
 _ = test_run:switch("default")
  | ---
  | ...
diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua
index e087f58..70e5a72 100644
--- a/test/router/master_discovery.test.lua
+++ b/test/router/master_discovery.test.lua
@@ -67,6 +67,16 @@ function check_master_for_replicaset(rs_id, master_name)
     return true                                                                 \
 end
 
+function check_no_master_for_replicaset(rs_id)                                  \
+    local rs_uuid = util.replicasets[rs_id]                                     \
+    local master = vshard.router.static.replicasets[rs_uuid].master             \
+    if not master then                                                          \
+        return true                                                             \
+    end                                                                         \
+    vshard.router.static.master_search_fiber:wakeup()                           \
+    return false                                                                \
+end
+
 function check_all_buckets_found()                                              \
     if vshard.router.info().bucket.unknown == 0 then                            \
         return true                                                             \
@@ -96,6 +106,22 @@ function stop_aggressive_master_search()
     master_search_helper_f = nil                                                \
 end
 
+function master_discovery_block()                                               \
+    vshard.router.internal.errinj.ERRINJ_MASTER_SEARCH_DELAY = true             \
+end
+
+function check_master_discovery_block()                                         \
+    if vshard.router.internal.errinj.ERRINJ_MASTER_SEARCH_DELAY == 'in' then    \
+        return true                                                             \
+    end                                                                         \
+    vshard.router.static.master_search_fiber:wakeup()                           \
+    return false                                                                \
+end
+
+function master_discovery_unblock()                                             \
+    vshard.router.internal.errinj.ERRINJ_MASTER_SEARCH_DELAY = false            \
+end
+
 --
 -- Simulate the first cfg when no masters are known.
 --
@@ -240,6 +266,171 @@ do
     })                                                                          \
 end
 
+--
+-- Restore the old master back.
+--
+test_run:switch('storage_1_b')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_b].master = false
+replicas[util.name_to_uuid.storage_1_a].master = true
+vshard.storage.cfg(cfg, instance_uuid)
+
+--
+-- RW call uses a hint from the old master about who is the new master.
+--
+test_run:switch('router_1')
+forget_masters()
+test_run:wait_cond(check_all_masters_found)
+master_discovery_block()
+test_run:wait_cond(check_master_discovery_block)
+
+-- Change master while discovery is asleep.
+test_run:switch('storage_1_a')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = false
+replicas[util.name_to_uuid.storage_1_b].master = true
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('storage_1_b')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = false
+replicas[util.name_to_uuid.storage_1_b].master = true
+vshard.storage.cfg(cfg, instance_uuid)
+
+-- First request fails and tells where is the master. The second attempt works.
+test_run:switch('router_1')
+vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)
+
+test_run:switch('storage_1_b')
+assert(echo_count == 1)
+echo_count = 0
+
+--
+-- A non master error might contain no information about a new master.
+--
+-- Make the replicaset read-only.
+test_run:switch('storage_1_a')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_b].master = false
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('storage_1_b')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_b].master = false
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('router_1')
+-- A request should return no info about a new master. The router will wait for
+-- a new master discovery.
+f = fiber.create(function()                                                     \
+    fiber.self():set_joinable(true)                                             \
+    return vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout)            \
+end)
+test_run:wait_cond(function()                                                   \
+    return check_no_master_for_replicaset(1)                                    \
+end)
+
+test_run:switch('storage_1_a')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = true
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('storage_1_b')
+replicas = cfg.sharding[util.replicasets[1]].replicas
+replicas[util.name_to_uuid.storage_1_a].master = true
+vshard.storage.cfg(cfg, instance_uuid)
+
+test_run:switch('router_1')
+master_discovery_unblock()
+test_run:wait_cond(check_all_masters_found)
+f:join()
+
+test_run:switch('storage_1_a')
+assert(echo_count == 1)
+echo_count = 0
+
+--
+-- Unit tests for master change in a replicaset object. Normally it can only
+-- happen in quite complicated cases. Hence the tests prefer to use the internal
+-- replicaset object instead.
+-- Disable the master search fiber so as it wouldn't interfere.
+--
+test_run:switch('router_1')
+master_discovery_block()
+test_run:wait_cond(check_master_discovery_block)
+rs = vshard.router.static.replicasets[util.replicasets[1]]
+storage_a_uuid = util.name_to_uuid.storage_1_a
+storage_b_uuid = util.name_to_uuid.storage_1_b
+
+assert(rs.master.uuid == storage_a_uuid)
+rs.master = nil
+rs.is_auto_master = false
+
+-- When auto-search is disabled and master is not known, nothing will make it
+-- known. It is up to the config.
+assert(not rs:update_master(storage_a_uuid, storage_b_uuid))
+assert(not rs.master)
+-- New master might be not reported.
+assert(not rs:update_master(storage_a_uuid))
+assert(not rs.master)
+
+-- With auto-search and not known master it is not assigned if a new master is
+-- not reported.
+rs.is_auto_master = true
+-- But update returns true, because it makes sense to try a next request later
+-- when the master is found.
+assert(rs:update_master(storage_a_uuid))
+assert(not rs.master)
+
+-- Report of a not known UUID won't assign the master.
+assert(rs:update_master(storage_a_uuid, util.name_to_uuid.storage_2_a))
+assert(not rs.master)
+
+-- Report of a known UUID assigns the master.
+assert(rs:update_master(storage_a_uuid, storage_b_uuid))
+assert(rs.master.uuid == storage_b_uuid)
+
+-- Master could change while the request's error was being received. Then the
+-- error should not change anything because it is outdated.
+assert(rs:update_master(storage_a_uuid))
+assert(rs.master.uuid == storage_b_uuid)
+-- It does not depend on auto-search. Still returns true, because if the master
+-- was changed since the request was sent, it means it could be retried and
+-- might succeed.
+rs.is_auto_master = false
+assert(rs:update_master(storage_a_uuid))
+assert(rs.master.uuid == storage_b_uuid)
+
+-- If the current master is reported as not a master and auto-search is
+-- disabled, update should fail. Because makes no sense to retry until a new
+-- config is applied externally.
+assert(not rs:update_master(storage_b_uuid, storage_a_uuid))
+assert(rs.master.uuid == storage_b_uuid)
+
+-- With auto-search, if the node is not a master and no new master is reported,
+-- the current master should be reset. Because makes no sense to send more RW
+-- requests to him. But update returns true, because the current request could
+-- be retried after waiting for a new master discovery.
+rs.is_auto_master = true
+assert(rs:update_master(storage_b_uuid))
+assert(rs.master == nil)
+
+-- When candidate is reported, and is known, it is used. But restore the master
+-- first to test its change.
+assert(rs:update_master(storage_b_uuid, storage_a_uuid))
+assert(rs.master.uuid == storage_a_uuid)
+-- Now update.
+assert(rs:update_master(storage_a_uuid, storage_b_uuid))
+assert(rs.master.uuid == storage_b_uuid)
+
+-- Candidate UUID might be not known in case the topology config is different on
+-- the router and on the storage. Then the master is simply reset.
+assert(rs:update_master(storage_b_uuid, util.name_to_uuid.storage_2_a))
+assert(rs.master == nil)
+
+master_discovery_unblock()
+test_run:wait_cond(check_all_masters_found)
+
 _ = test_run:switch("default")
 _ = test_run:cmd("stop server router_1")
 _ = test_run:cmd("cleanup server router_1")
diff --git a/test/router/router.result b/test/router/router.result
index 98dd5b5..8a0e30d 100644
--- a/test/router/router.result
+++ b/test/router/router.result
@@ -1105,11 +1105,12 @@ _ = test_run:switch("router_1")
 util.check_error(vshard.router.call, 1, 'write', 'echo', { 'hello world' })
 ---
 - null
-- replica_uuid: <storage_2_a>
-  replicaset_uuid: <replicaset_2>
-  type: ShardingError
+- master_uuid: <storage_2_b>
+  replica_uuid: <storage_2_a>
   message: Replica <storage_2_a> is not a master for replicaset
     <replicaset_2> anymore
+  type: ShardingError
+  replicaset_uuid: <replicaset_2>
   name: NON_MASTER
   code: 2
 ...
@@ -1175,6 +1176,7 @@ error_messages
   - Use replicaset:connect_replica(...) instead of replicaset.connect_replica(...)
   - Use replicaset:down_replica_priority(...) instead of replicaset.down_replica_priority(...)
   - Use replicaset:up_replica_priority(...) instead of replicaset.up_replica_priority(...)
+  - Use replicaset:update_master(...) instead of replicaset.update_master(...)
   - Use replicaset:wait_connected(...) instead of replicaset.wait_connected(...)
 ...
 _, replica = next(replicaset.replicas)
diff --git a/vshard/error.lua b/vshard/error.lua
index e2d1a31..fa7bdaa 100644
--- a/vshard/error.lua
+++ b/vshard/error.lua
@@ -20,7 +20,7 @@ local error_message_template = {
     [2] = {
         name = 'NON_MASTER',
         msg = 'Replica %s is not a master for replicaset %s anymore',
-        args = {'replica_uuid', 'replicaset_uuid'}
+        args = {'replica_uuid', 'replicaset_uuid', 'master_uuid'}
     },
     [3] = {
         name = 'BUCKET_ALREADY_EXISTS',
diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
index 7747258..660c786 100644
--- a/vshard/replicaset.lua
+++ b/vshard/replicaset.lua
@@ -570,6 +570,70 @@ local function rebind_replicasets(replicasets, old_replicasets)
     end
 end
 
+--
+-- Let the replicaset know @a old_master_uuid is not a master anymore, should
+-- use @a candidate_uuid instead.
+-- Returns whether the request, which brought this information, can be retried.
+--
+local function replicaset_update_master(replicaset, old_master_uuid,
+                                        candidate_uuid)
+    local is_auto = replicaset.is_auto_master
+    local replicaset_uuid = replicaset.uuid
+    if old_master_uuid == candidate_uuid then
+        -- It should not happen ever, but be ready to everything.
+        log.warn('Replica %s in replicaset %s reports self as both master '..
+                 'and not master', master_uuid, replicaset_uuid)
+        return is_auto
+    end
+    local master = replicaset.master
+    if not master then
+        if not is_auto or not candidate_uuid then
+            return is_auto
+        end
+        local candidate = replicaset.replicas[candidate_uuid]
+        if not candidate then
+            return true
+        end
+        replicaset.master = candidate
+        log.info('Replica %s becomes a master as reported by %s for '..
+                 'replicaset %s', candidate_uuid, old_master_uuid,
+                 replicaset_uuid)
+        return true
+    end
+    local master_uuid = master.uuid
+    if master_uuid ~= old_master_uuid then
+        -- Master was changed while the master update information was coming.
+        -- It means it is outdated and should be ignored.
+        -- Return true regardless of the auto-master config. Because the master
+        -- change means the caller's request has a chance to succeed on the new
+        -- master on retry.
+        return true
+    end
+    if not is_auto then
+        log.warn('Replica %s is not master for replicaset %s anymore, please '..
+                 'update configuration', master_uuid, replicaset_uuid)
+        return false
+    end
+    if not candidate_uuid then
+        replicaset.master = nil
+        log.warn('Replica %s is not a master anymore for replicaset %s, no '..
+                 'candidate was reported', master_uuid, replicaset_uuid)
+        return true
+    end
+    local candidate = replicaset.replicas[candidate_uuid]
+    if candidate then
+        replicaset.master = candidate
+        log.info('Replica %s becomes master instead of %s for replicaset %s',
+                 candidate_uuid, master_uuid, replicaset_uuid)
+    else
+        replicaset.master = nil
+        log.warn('Replica %s is not a master anymore for replicaset %s, new '..
+                 'master %s could not be found in the config',
+                 master_uuid, replicaset_uuid, candidate_uuid)
+    end
+    return true
+end
+
 --
 -- Check if the master is still master, and find a new master if there is no a
 -- known one.
@@ -693,6 +757,7 @@ local replicaset_mt = {
         callbro = replicaset_template_multicallro(false, true);
         callre = replicaset_template_multicallro(true, false);
         callbre = replicaset_template_multicallro(true, true);
+        update_master = replicaset_update_master,
     };
     __tostring = replicaset_tostring;
 }
diff --git a/vshard/router/init.lua b/vshard/router/init.lua
index 9407ccd..b36ee8e 100644
--- a/vshard/router/init.lua
+++ b/vshard/router/init.lua
@@ -33,6 +33,7 @@ if not M then
             ERRINJ_FAILOVER_CHANGE_CFG = false,
             ERRINJ_RELOAD = false,
             ERRINJ_LONG_DISCOVERY = false,
+            ERRINJ_MASTER_SEARCH_DELAY = false,
         },
         -- Dictionary, key is router name, value is a router.
         routers = {},
@@ -620,12 +621,11 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
                 bucket_reset(router, bucket_id)
                 return nil, err
             elseif err.code == lerror.code.NON_MASTER then
-                -- Same, as above - do not wait and repeat.
                 assert(mode == 'write')
-                log.warn("Replica %s is not master for replicaset %s anymore,"..
-                         "please update configuration!",
-                          replicaset.master.uuid, replicaset.uuid)
-                return nil, err
+                if not replicaset:update_master(err.replica_uuid,
+                                                err.master_uuid) then
+                    return nil, err
+                end
             else
                 return nil, err
             end
@@ -1059,7 +1059,14 @@ end
 local function master_search_f(router)
     local module_version = M.module_version
     local is_in_progress = false
+    local errinj = M.errinj
     while module_version == M.module_version do
+        if errinj.ERRINJ_MASTER_SEARCH_DELAY then
+            errinj.ERRINJ_MASTER_SEARCH_DELAY = 'in'
+            repeat
+                lfiber.sleep(0.001)
+            until not errinj.ERRINJ_MASTER_SEARCH_DELAY
+        end
         local timeout
         local start_time = fiber_clock()
         local is_done, is_nop, err = master_search_step(router)
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 465abc5..3158f3c 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -971,9 +971,13 @@ local function bucket_check_state(bucket_id, mode)
         end
         reason = 'write is prohibited'
     elseif M.this_replicaset.master ~= M.this_replica then
+        local master_uuid = M.this_replicaset.master
+        if master_uuid then
+            master_uuid = master_uuid.uuid
+        end
         return bucket, lerror.vshard(lerror.code.NON_MASTER,
                                      M.this_replica.uuid,
-                                     M.this_replicaset.uuid)
+                                     M.this_replicaset.uuid, master_uuid)
     else
         return bucket
     end
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/6] replicaset: introduce netbox_wait_connected()
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 1/6] replicaset: introduce netbox_wait_connected() Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-02 11:46   ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-02 11:46 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch. LGTM.

On 02.07.2021 01:09, Vladislav Shpilevoy wrote:
> It is extracted from replicaset_wait_connected(). The new
> function is going to be used in the future patch with master
> discovery. There it will be needed to send async requests to
> several replicas at once, but before that need to ensure they are
> connected. Otherwise the async request fails right away.
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 2/6] test: sort some table prints
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 2/6] test: sort some table prints Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-02 11:46   ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-02 11:46 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch. LGTM.

On 02.07.2021 01:09, Vladislav Shpilevoy wrote:
> Router.test.lua and upgrade.test.lua printed some Lua tables which
> were not sorted. Next patches are going to alter them slightly and
> if the tables are not sorted, some old keys would change place and
> increase the diff.
>
> The patch makes these outputs sorted so any amendment won't touch
> the existing lines.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 3/6] storage: introduce vshard.storage._call('info')
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 3/6] storage: introduce vshard.storage._call('info') Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-02 11:46   ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-02 11:46 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch. LGTM

On 02.07.2021 01:09, Vladislav Shpilevoy wrote:
> Internal info function returns data needed by the routers. Now it
> is only is_master flag. This is required for master discovery.
>
> In future the function is supposed to get more attributes. For instance,
> instance's own part of its config, bucket generation or bucket sequence number.
>
> Needed for #75

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 4/6] config: introduce master 'auto' replicaset option
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 4/6] config: introduce master 'auto' replicaset option Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-02 11:47   ` Oleg Babin via Tarantool-patches
  2021-07-02 21:32     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-02 11:47 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch. LGTM. One nit below.

On 02.07.2021 01:09, Vladislav Shpilevoy wrote:
> The new option will enable auto-search for master on the marked
> replicasets.
>
> Part of #75
> ---
>
> +local function check_replicaset_master(master, ctx)
> +    if not lutil.version_is_at_least(1, 10, 0) then
> +        error('Only versions >= 1.10 support master discovery')
> +    end

If we talk about "is_async" netbox option. We should check 1.10.1:

https://github.com/tarantool/tarantool/commit/0f686829a89b87d4f8d10fd25d4acfdea9b3dc60#diff-1e545a3af3d3592423eff523367e962e18f7eda45c2f9bef6994a254f6044d70

There is a gap between 1.10.0 and 1.10.1 when there is no is_async.

Feel free to ignore. I don't believe it will happen ever.

> +    if master ~= 'auto' then
> +        error('Only "auto" master is supported')
> +    end
> +end
> +

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-02 11:48   ` Oleg Babin via Tarantool-patches
  2021-07-02 21:35     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-02 11:48 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch. See 8 comments below.

On 02.07.2021 01:09, Vladislav Shpilevoy wrote:
> Part of #75
>
> @TarantoolBot document
> Title: VShard router master discovery
>
> Router used not to be able to find master nodes in the configured
> replicasets on its own. It relied only on how they were specified
> in the config.
>
> This becomes a problem when master changes and the change is not
> delivered to the router's config somewhy. For instance, the router
> does not rely on a central config provider. Or it does rely, but
> the provider can't deliver a new config due to any reason.
>
> This is getting especially tricky with built-in automatic master
> elections which are not supported by vshard yet, but they will be,
> they won't depend on any config. Master role won't be pinned to
> one node then.


Is it possible to implement some simple test that shows if RAFT changes

a leader, vshard catches this changes? Of course, it will be test for 
new Tarantool versions.

> Now there is a new feature to overcome the master search problem -
> configurable automatic master discovery on the router.
>
> Router goes to the replicasets, marked as having an auto master,
> finds a master in them, and periodically checks if the master is
> still a master.
>
> When a master in a replicaset stops being a master, the router
> walks all nodes of the replicaset and finds who is the new master.
>
> To turn the feature on there is a new option in router's config:
> `master = 'auto'`. It should be specified per-replicaset, and is
> not compatible with specifying a master manually.
>
> This is how a good config looks:
> ```
> config = {
>      sharding = {
>          <replicaset uuid> = {
>              master = 'auto',
>              replicas = {...},
>          },
>          ...
>      },
>      ...
> }
> ```
>
> This is how a bad config looks:
> ```
> config = {
>      sharding = {
>          <replicaset uuid> = {
>              master = 'auto',
>              replicas = {
>                  <replica uuid1> = {
>                      master = true,
>                      ...
>                  },
>                  <replica uuid2> = {
>                      master = false,
>                      ...
>                  },
>              },
>          },
>          ...
>      },
>      ...
> }
> ```
> It will not work, because either `master = 'auto'` can be
> specified, or the master is assigned manually. Not both at the
> same time.
> ---
>   test/router/master_discovery.result   | 514 ++++++++++++++++++++++++++
>   test/router/master_discovery.test.lua | 248 +++++++++++++
>   vshard/consts.lua                     |   5 +
>   vshard/error.lua                      |   5 +
>   vshard/replicaset.lua                 | 203 +++++++++-
>   vshard/router/init.lua                |  98 ++++-
>   6 files changed, 1051 insertions(+), 22 deletions(-)
>   create mode 100644 test/router/master_discovery.result
>   create mode 100644 test/router/master_discovery.test.lua
>
>
>
> diff --git a/vshard/consts.lua b/vshard/consts.lua
> index 47a893b..66a09ae 100644
> --- a/vshard/consts.lua
> +++ b/vshard/consts.lua
> @@ -52,6 +52,11 @@ return {
>       DISCOVERY_WORK_STEP = 0.01,
>       DISCOVERY_TIMEOUT = 10,
>   
> +    MASTER_SEARCH_IDLE_INTERVAL = 5,
> +    MASTER_SEARCH_WORK_INTERVAL = 0.5,
> +    MASTER_SEARCH_BACKOFF_INTERVAL = 5,
> +    MASTER_SEARCH_TIEMOUT = 5,
> +
>       TIMEOUT_INFINITY = 500 * 365 * 86400,
>       DEADLINE_INFINITY = math.huge,
>   }
> diff --git a/vshard/error.lua b/vshard/error.lua
> index bcbcd71..e2d1a31 100644
> --- a/vshard/error.lua
> +++ b/vshard/error.lua
> @@ -153,6 +153,11 @@ local error_message_template = {
>           name = 'BUCKET_RECV_DATA_ERROR',
>           msg = 'Can not receive the bucket %s data in space "%s" at tuple %s: %s',
>           args = {'bucket_id', 'space', 'tuple', 'reason'},
> +    },
> +    [31] = {
> +        name = 'MULTIPLE_MASTERS_FOUND',
> +        msg = 'Found more than one master in replicaset %s on nodes %s and %s',
> +        args = {'replicaset_uuid', 'master1', 'master2'},
>       }
>   }

Is it possible to test this error?

>   
> diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
> index fa048c9..7747258 100644
> --- a/vshard/replicaset.lua
> +++ b/vshard/replicaset.lua
> @@ -25,6 +25,10 @@
>   --          }
>   --      },
>   --      master = <master server from the array above>,
> +--      master_cond = <condition variable signaled when the replicaset finds or
> +--                     changes its master>,
> +--      is_auto_master = <true when is configured to find the master on
> +--                        its own>,
>   --      replica = <nearest available replica object>,
>   --      balance_i = <index of a next replica in priority_list to
>   --                   use for a load-balanced request>,
> @@ -55,6 +59,8 @@ local luuid = require('uuid')
>   local ffi = require('ffi')
>   local util = require('vshard.util')
>   local fiber_clock = fiber.clock
> +local fiber_yield = fiber.yield
> +local fiber_cond_wait = util.fiber_cond_wait
>   local gsc = util.generate_self_checker
>   
>   --
> @@ -159,6 +165,26 @@ local function replicaset_connect_to_replica(replicaset, replica)
>       return conn
>   end
>   
> +local function replicaset_wait_master(replicaset, timeout)
> +    local master = replicaset.master
> +    -- Fast path - master is almost always known.
> +    if master then
> +        return master, timeout
> +    end
> +    -- Slow path.
> +    local deadline = fiber_clock() + timeout
> +    repeat
> +        if not replicaset.is_auto_master or
> +           not fiber_cond_wait(replicaset.master_cond, timeout) then
> +            return nil, lerror.vshard(lerror.code.MISSING_MASTER,
> +                                      replicaset.uuid)
> +        end
> +        timeout = deadline - fiber_clock()
> +        master = replicaset.master
> +    until master
> +    return master, timeout
> +end
> +
>   --
>   -- Create net.box connection to master.
>   --
> @@ -175,7 +201,13 @@ end
>   -- Wait until the master instance is connected.
>   --
>   local function replicaset_wait_connected(replicaset, timeout)
> -    return netbox_wait_connected(replicaset_connect_master(replicaset), timeout)
> +    local master
> +    master, timeout = replicaset_wait_master(replicaset, timeout)
> +    if not master then
> +        return nil, timeout
> +    end
> +    local conn = replicaset_connect_to_replica(replicaset, master)
> +    return netbox_wait_connected(conn, timeout)
>   end
>   
>   --
> @@ -345,18 +377,30 @@ local function replicaset_master_call(replicaset, func, args, opts)
>       assert(opts == nil or type(opts) == 'table')
>       assert(type(func) == 'string', 'function name')
>       assert(args == nil or type(args) == 'table', 'function arguments')
> -    local conn, err = replicaset_connect_master(replicaset)
> -    if not conn then
> -        return nil, err
> -    end
> -    if not opts then
> -        opts = {timeout = replicaset.master.net_timeout}
> -    elseif not opts.timeout then
> -        opts = table.copy(opts)
> -        opts.timeout = replicaset.master.net_timeout
> +    local master = replicaset.master
> +    if not master then
> +        opts = opts and table.copy(opts) or {}
> +        if opts.is_async then
> +            return nil, lerror.vshard(lerror.code.MISSING_MASTER,
> +                                      replicaset.uuid)
> +        end

Could / should we here wakeup master discovery if "auto" is specified?

> +        local timeout = opts.timeout or consts.MASTER_SEARCH_TIEMOUT
> +        master, timeout = replicaset_wait_master(replicaset, timeout)
> +        if not master then
> +            return nil, timeout
> +        end
> +        opts.timeout = timeout
> +    else
> +        if not opts then
> +            opts = {timeout = master.net_timeout}
> +        elseif not opts.timeout then
> +            opts = table.copy(opts)
> +            opts.timeout = master.net_timeout
> +        end
>       end
> +    replicaset_connect_to_replica(replicaset, master)
>       local net_status, storage_status, retval, error_object =
> -        replica_call(replicaset.master, func, args, opts)
> +        replica_call(master, func, args, opts)
>       -- Ignore net_status - master does not retry requests.
>       return storage_status, retval, error_object
>   end
> @@ -425,11 +469,6 @@ local function replicaset_template_multicallro(prefer_replica, balance)
>                   return r
>               end
>           end
> -        local conn, err = replicaset_connect_master(replicaset)
> -        if not conn then
> -            return nil, err
> -        end
> -        return master
>       end

Why don't we need this part anymore?

>       return function(replicaset, func, args, opts)
> @@ -444,9 +483,13 @@ local function replicaset_template_multicallro(prefer_replica, balance)
>           end
>           local end_time = fiber_clock() + timeout
>           while not net_status and timeout > 0 do
> -            replica, err = pick_next_replica(replicaset)
> +            replica = pick_next_replica(replicaset)
>               if not replica then
> -                return nil, err
> +                replica, timeout = replicaset_wait_master(replicaset, timeout)
> +                if not replica then
> +                    return nil, timeout
> +                end
> +                replicaset_connect_to_replica(replicaset, replica)
>               end
>               opts.timeout = timeout
>               net_status, storage_status, retval, err =
> @@ -508,7 +551,128 @@ local function rebind_replicasets(replicasets, old_replicasets)
>                   end
>               end
>           end
> +        if old_replicaset then
> +            -- Take a hint from the old replicaset who is the master now.
> +            if replicaset.is_auto_master then
> +                local master = old_replicaset.master
> +                if master then
> +                    replicaset.master = replicaset.replicas[master.uuid]
> +                end
> +            end
> +            -- Stop waiting for master in the old replicaset. Its running
> +            -- requests won't find it anyway. Auto search works only for the
> +            -- most actual replicaset objects.
> +            if old_replicaset.is_auto_master then
> +                old_replicaset.is_auto_master = false
> +                old_replicaset.master_cond:broadcast()
> +            end
> +        end
> +    end
> +end
> +
> +--
> +-- Check if the master is still master, and find a new master if there is no a
> +-- known one.
> +--
> +local function replicaset_locate_master(replicaset)
> +    local is_done = true
> +    local is_nop = true
> +    if not replicaset.is_auto_master then
> +        return is_done, is_nop
> +    end
> +    local func = 'vshard.storage._call'
> +    local args = {'info'}
> +    local const_timeout = consts.MASTER_SEARCH_TIEMOUT
> +    local sync_opts = {timeout = const_timeout}
> +    local ok, res, err, f
> +    local master = replicaset.master
> +    if master then
> +        ok, res, err = replica_call(master, func, args, sync_opts)
> +        if not ok then
> +            return is_done, is_nop, err
> +        end
> +        if res.is_master then
> +            return is_done, is_nop
> +        end
> +        log.info('Master of replicaset %s, node %s, has resigned. Trying '..
> +                 'to find a new one', replicaset.uuid, master.uuid)
> +        replicaset.master = nil
> +    end
> +    is_nop = false
> +
> +    local last_err
> +    local futures = {}
> +    local timeout = const_timeout
> +    local deadline = fiber_clock() + timeout
> +    local async_opts = {is_async = true}
> +    local replicaset_uuid = replicaset.uuid
> +    for replica_uuid, replica in pairs(replicaset.replicas) do
> +        local conn = replica.conn
> +        timeout, err = netbox_wait_connected(conn, timeout)
> +        if not timeout then
> +            last_err = err
> +            timeout = deadline - fiber_clock()
> +        else
> +            ok, f = pcall(conn.call, conn, func, args, async_opts)
> +            if not ok then
> +                last_err = lerror.make(f)
> +            else
> +                futures[replica_uuid] = f
> +            end
> +        end
> +    end
> +    local master_uuid
> +    for replica_uuid, f in pairs(futures) do
> +        if timeout < 0 then
> +            -- Netbox uses cond var inside, whose wait throws an error when gets
> +            -- a negative timeout. Avoid that.
> +            timeout = 0
> +        end
> +        res, err = f:wait_result(timeout)
> +        timeout = deadline - fiber_clock()
> +        if not res then
> +            f:discard()
> +            last_err = err
> +            goto next_wait

If timeout will be negative we anyway go to next_wait and turn it to 0 
at the next iteration.

> +        end
> +        res = res[1]
> +        if not res.is_master then
> +            goto next_wait
> +        end
> +        if not master_uuid then
> +            master_uuid = replica_uuid
> +            goto next_wait
> +        end
> +        is_done = false
> +        last_err = lerror.vshard(lerror.code.MULTIPLE_MASTERS_FOUND,
> +                                 replicaset_uuid, master_uuid, replica_uuid)
> +        do return is_done, is_nop, last_err end
> +        ::next_wait::
> +    end
> +    master = replicaset.replicas[master_uuid]
> +    if master then
> +        log.info('Found master for replicaset %s: %s', replicaset_uuid,
> +                 master_uuid)
> +        replicaset.master = master
> +        replicaset.master_cond:broadcast()
> +    else
> +        is_done = false
> +    end
> +    return is_done, is_nop, last_err
> +end
> +
> +local function locate_masters(replicasets)
> +    local is_all_done = true
> +    local is_all_nop = true
> +    local last_err
> +    for _, replicaset in pairs(replicasets) do
> +        local is_done, is_nop, err = replicaset_locate_master(replicaset)

I think we should log result of master discovery. Especially if error 
occurred.

> +        is_all_done = is_all_done and is_done
> +        is_all_nop = is_all_nop and is_nop
> +        last_err = err or last_err
> +        fiber_yield()
>       end
> +    return is_all_done, is_all_nop, last_err
>   end
>   
>   --
> @@ -729,6 +893,8 @@ local function buildall(sharding_cfg)
>               bucket_count = 0,
>               lock = replicaset.lock,
>               balance_i = 1,
> +            is_auto_master = replicaset.master == 'auto',
> +            master_cond = fiber.cond(),
>           }, replicaset_mt)
>           local priority_list = {}
>           for replica_uuid, replica in pairs(replicaset.replicas) do
> @@ -802,4 +968,5 @@ return {
>       wait_masters_connect = wait_masters_connect,
>       rebind_replicasets = rebind_replicasets,
>       outdate_replicasets = outdate_replicasets,
> +    locate_masters = locate_masters,
>   }
> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
> index 5e2a96b..9407ccd 100644
> --- a/vshard/router/init.lua
> +++ b/vshard/router/init.lua
> @@ -68,6 +68,8 @@ local ROUTER_TEMPLATE = {
>           replicasets = nil,
>           -- Fiber to maintain replica connections.
>           failover_fiber = nil,
> +        -- Fiber to watch for master changes and find new masters.
> +        master_search_fiber = nil,
>           -- Fiber to discovery buckets in background.
>           discovery_fiber = nil,
>           -- How discovery works. On - work infinitely. Off - no
> @@ -1030,6 +1032,93 @@ local function failover_f(router)
>       end
>   end
>   
> +--------------------------------------------------------------------------------
> +-- Master search
> +--------------------------------------------------------------------------------
> +
> +local function master_search_step(router)
> +    local ok, is_done, is_nop, err = pcall(lreplicaset.locate_masters,
> +                                           router.replicasets)
> +    if not ok then
> +        err = is_done
> +        is_done = false
> +        is_nop = false
> +    end
> +    return is_done, is_nop, err
> +end
> +
> +--
> +-- Master discovery background function. It is supposed to notice master changes
> +-- and find new masters in the replicasets, which are configured for that.
> +--
> +-- XXX: due to polling the search might notice master change not right when it
Is this TODO related to https://github.com/tarantool/vshard/issues/284 ?
> +-- happens. In future it makes sense to rewrite master search using
> +-- subscriptions. The problem is that at the moment of writing the subscriptions
> +-- are not working well in all Tarantool versions.
> +--
> +local function master_search_f(router)
> +    local module_version = M.module_version
> +    local is_in_progress = false
> +    while module_version == M.module_version do
> +        local timeout
> +        local start_time = fiber_clock()
> +        local is_done, is_nop, err = master_search_step(router)
> +        if err then
> +            log.error('Error during master search: %s', lerror.make(err))
> +        end
> +        if is_done then
> +            timeout = consts.MASTER_SEARCH_IDLE_INTERVAL
> +        elseif err then
> +            timeout = consts.MASTER_SEARCH_BACKOFF_INTERVAL
> +        else
> +            timeout = consts.MASTER_SEARCH_WORK_INTERVAL
> +        end
> +        if not is_in_progress then
> +            if not is_nop and is_done then
> +                log.info('Master search happened')
> +            elseif not is_done then
> +                log.info('Master search is started')
> +                is_in_progress = true
> +            end
> +        elseif is_done then
> +            log.info('Master search is finished')
> +            is_in_progress = false
> +        end
> +        local end_time = fiber_clock()
> +        local duration = end_time - start_time
> +        if not is_nop then
> +            log.verbose('Master search step took %s seconds. Next in %s '..
> +                        'seconds', duration, timeout)
> +        end
> +        lfiber.sleep(timeout)
> +    end
> +end
> +
> +local function master_search_set(router)
> +    local enable = false
> +    for _, rs in pairs(router.replicasets) do
> +        if rs.is_auto_master then
> +            enable = true
> +            break
> +        end
> +    end
> +    local search_fiber = router.master_search_fiber
> +    if enable and search_fiber == nil then
> +        log.info('Master auto search is enabled')
> +        router.master_search_fiber = util.reloadable_fiber_create(
> +            'vshard.master_search.' .. router.name, M, 'master_search_f',
> +            router)

On 1.10 fiber name is limited with 32 symbols.

tarantool> #'vshard.master_search.'
---
- 21
...

That mean that router name is limited with 11 chars.

It could be a problem. As I see you don't use {truncate = true} when 
assign a name

https://github.com/tarantool/vshard/blob/a394e3f4008a7436f011c81028dcaadc270eebf6/vshard/util.lua#L93


Also it's possible to wakeup discovery and rebalanser fibers. Probably 
we should give a way to

wakeup master_search_fiber.

> +    elseif not enable and search_fiber ~= nil then
> +        -- Do not make users pay for what they do not use - when the search is
> +        -- disabled for all replicasets, there should not be any fiber.
> +        log.info('Master auto search is disabled')
> +        if search_fiber:status() ~= 'dead' then
> +            search_fiber:cancel()
> +        end
> +        router.master_search_fiber = nil
> +    end
> +end
> +
>   --------------------------------------------------------------------------------
>   -- Configuration
>   --------------------------------------------------------------------------------
> @@ -1100,6 +1189,7 @@ local function router_cfg(router, cfg, is_reload)
>               'vshard.failover.' .. router.name, M, 'failover_f', router)
>       end
>       discovery_set(router, vshard_cfg.discovery_mode)
> +    master_search_set(router)
>   end
>   
>   --------------------------------------------------------------------------------
> @@ -1535,6 +1625,10 @@ end
>   --------------------------------------------------------------------------------
>   -- Module definition
>   --------------------------------------------------------------------------------
> +M.discovery_f = discovery_f
> +M.failover_f = failover_f
> +M.master_search_f = master_search_f
> +M.router_mt = router_mt
>   --
>   -- About functions, saved in M, and reloading see comment in
>   -- storage/init.lua.
> @@ -1556,10 +1650,6 @@ else
>       M.module_version = M.module_version + 1
>   end
>   
> -M.discovery_f = discovery_f
> -M.failover_f = failover_f
> -M.router_mt = router_mt
> -
>   module.cfg = legacy_cfg
>   module.new = router_new
>   module.internal = M

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 6/6] router: update master using a hint from storage
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 6/6] router: update master using a hint from storage Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-02 11:49   ` Oleg Babin via Tarantool-patches
  2021-07-02 21:36     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-02 11:49 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch. See 2 comments below.

On 02.07.2021 01:09, Vladislav Shpilevoy wrote:
> Storage sends NON_MASTER error when an attempt happens to make a
> read-write request on it while it is not a master. The error
> contains UUID of the instance.
>
> The patch adds to this error a new field - UUID of the master as
> it is seen on this storage. Router now uses that information to
> quickly switch its read-write requests to the new master. In fact,
> this should happen in almost all cases of master auto-discovery on
> the router if it occurs under load.
>
> Closes #75
> ---
>   test/router/master_discovery.result   | 427 ++++++++++++++++++++++++++
>   test/router/master_discovery.test.lua | 191 ++++++++++++
>   test/router/router.result             |   8 +-
>   vshard/error.lua                      |   2 +-
>   vshard/replicaset.lua                 |  65 ++++
>   vshard/router/init.lua                |  17 +-
>   vshard/storage/init.lua               |   6 +-
>   7 files changed, 706 insertions(+), 10 deletions(-)
>
>
>   ...
> @@ -1175,6 +1176,7 @@ error_messages
>     - Use replicaset:connect_replica(...) instead of replicaset.connect_replica(...)
>     - Use replicaset:down_replica_priority(...) instead of replicaset.down_replica_priority(...)
>     - Use replicaset:up_replica_priority(...) instead of replicaset.up_replica_priority(...)
> +  - Use replicaset:update_master(...) instead of replicaset.update_master(...)
>     - Use replicaset:wait_connected(...) instead of replicaset.wait_connected(...)
>   ...
>   _, replica = next(replicaset.replicas)
> diff --git a/vshard/error.lua b/vshard/error.lua
> index e2d1a31..fa7bdaa 100644
> --- a/vshard/error.lua
> +++ b/vshard/error.lua
> @@ -20,7 +20,7 @@ local error_message_template = {
>       [2] = {
>           name = 'NON_MASTER',
>           msg = 'Replica %s is not a master for replicaset %s anymore',
> -        args = {'replica_uuid', 'replicaset_uuid'}
> +        args = {'replica_uuid', 'replicaset_uuid', 'master_uuid'}
Error format string still contains only 2 arguments. Is it ok?
>       },
>       [3] = {
>           name = 'BUCKET_ALREADY_EXISTS',
> diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
> index 7747258..660c786 100644
> --- a/vshard/replicaset.lua
> +++ b/vshard/replicaset.lua
> @@ -570,6 +570,70 @@ local function rebind_replicasets(replicasets, old_replicasets)
>       end
>   end
>   
> +--
> +-- Let the replicaset know @a old_master_uuid is not a master anymore, should
> +-- use @a candidate_uuid instead.
> +-- Returns whether the request, which brought this information, can be retried.
> +--
> +local function replicaset_update_master(replicaset, old_master_uuid,
> +                                        candidate_uuid)
> +    local is_auto = replicaset.is_auto_master
> +    local replicaset_uuid = replicaset.uuid
> +    if old_master_uuid == candidate_uuid then
> +        -- It should not happen ever, but be ready to everything.
> +        log.warn('Replica %s in replicaset %s reports self as both master '..
> +                 'and not master', master_uuid, replicaset_uuid)
> +        return is_auto
> +    end
> +    local master = replicaset.master
> +    if not master then
> +        if not is_auto or not candidate_uuid then
> +            return is_auto
> +        end
> +        local candidate = replicaset.replicas[candidate_uuid]

AFAIU it means that candidate_uuid doesn't belong to replicaset.

Why is it true?

> +        if not candidate then
> +            return true
> +        end
> +        replicaset.master = candidate
> +        log.info('Replica %s becomes a master as reported by %s for '..
> +                 'replicaset %s', candidate_uuid, old_master_uuid,
> +                 replicaset_uuid)
> +        return true

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 4/6] config: introduce master 'auto' replicaset option
  2021-07-02 11:47   ` Oleg Babin via Tarantool-patches
@ 2021-07-02 21:32     ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-05  9:23       ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-02 21:32 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

Hi! Thanks for the extensive review!

>> +local function check_replicaset_master(master, ctx)
>> +    if not lutil.version_is_at_least(1, 10, 0) then
>> +        error('Only versions >= 1.10 support master discovery')
>> +    end
> 
> If we talk about "is_async" netbox option. We should check 1.10.1:
> 
> https://github.com/tarantool/tarantool/commit/0f686829a89b87d4f8d10fd25d4acfdea9b3dc60#diff-1e545a3af3d3592423eff523367e962e18f7eda45c2f9bef6994a254f6044d70
> 
> There is a gap between 1.10.0 and 1.10.1 when there is no is_async.
Good call. It is easy to fix so I did it:

====================
@@ -22,8 +22,9 @@ local function check_replica_master(master, ctx)
 end
 
 local function check_replicaset_master(master, ctx)
-    if not lutil.version_is_at_least(1, 10, 0) then
-        error('Only versions >= 1.10 support master discovery')
+    -- Limit the version due to extensive usage of netbox is_async feature.
+    if not lutil.version_is_at_least(1, 10, 1) then
+        error('Only versions >= 1.10.1 support master discovery')
     end
     if master ~= 'auto' then
         error('Only "auto" master is supported')

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery
  2021-07-02 11:48   ` Oleg Babin via Tarantool-patches
@ 2021-07-02 21:35     ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-05  9:24       ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-02 21:35 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

Thanks for the review!

>> This is getting especially tricky with built-in automatic master
>> elections which are not supported by vshard yet, but they will be,
>> they won't depend on any config. Master role won't be pinned to
>> one node then.
> 
> Is it possible to implement some simple test that shows if RAFT changes
> 
> a leader, vshard catches this changes? Of course, it will be test for new Tarantool versions.

Nope. Because the storages do not support it yet. 'Master' attribute
can only be set via the config now, unfortunately. This will change
in the future, but it is blocked by one of the bugs (about bucket refs
being useless on replicas).

>> diff --git a/vshard/error.lua b/vshard/error.lua
>> index bcbcd71..e2d1a31 100644
>> --- a/vshard/error.lua
>> +++ b/vshard/error.lua
>> @@ -153,6 +153,11 @@ local error_message_template = {
>>           name = 'BUCKET_RECV_DATA_ERROR',
>>           msg = 'Can not receive the bucket %s data in space "%s" at tuple %s: %s',
>>           args = {'bucket_id', 'space', 'tuple', 'reason'},
>> +    },
>> +    [31] = {
>> +        name = 'MULTIPLE_MASTERS_FOUND',
>> +        msg = 'Found more than one master in replicaset %s on nodes %s and %s',
>> +        args = {'replicaset_uuid', 'master1', 'master2'},
>>       }
>>   }
> 
> Is it possible to test this error?

It is tested, yes. See master_discovery.test.lua. Section
"-- Multiple masters logging". Although I couldn't find an
easy way to expose it into vshard.router.info(). So it is
simply logged now.

>>   diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
>> index fa048c9..7747258 100644
>> --- a/vshard/replicaset.lua
>> +++ b/vshard/replicaset.lua
>> @@ -345,18 +377,30 @@ local function replicaset_master_call(replicaset, func, args, opts)
>>       assert(opts == nil or type(opts) == 'table')
>>       assert(type(func) == 'string', 'function name')
>>       assert(args == nil or type(args) == 'table', 'function arguments')
>> -    local conn, err = replicaset_connect_master(replicaset)
>> -    if not conn then
>> -        return nil, err
>> -    end
>> -    if not opts then
>> -        opts = {timeout = replicaset.master.net_timeout}
>> -    elseif not opts.timeout then
>> -        opts = table.copy(opts)
>> -        opts.timeout = replicaset.master.net_timeout
>> +    local master = replicaset.master
>> +    if not master then
>> +        opts = opts and table.copy(opts) or {}
>> +        if opts.is_async then
>> +            return nil, lerror.vshard(lerror.code.MISSING_MASTER,
>> +                                      replicaset.uuid)
>> +        end
> 
> Could / should we here wakeup master discovery if "auto" is specified?

We can. But I am afraid that under considerable RW requests load we
might wakeup it too often. And if there are many routers, they might
create a storm of `vshard.storage._call('info')` requests putting a
notable load on the storages. So I decided it is not safe until the
subscriptions are implemented.

>> @@ -425,11 +469,6 @@ local function replicaset_template_multicallro(prefer_replica, balance)
>>                   return r
>>               end
>>           end
>> -        local conn, err = replicaset_connect_master(replicaset)
>> -        if not conn then
>> -            return nil, err
>> -        end
>> -        return master
>>       end
> 
> Why don't we need this part anymore?

Now the caller handles the case when couldn't find any node. See
the next diff hunk. I moved it outside, because I didn't want
pick_next_replica() to take a timeout argument. I wanted it to
be as simple as possible.

>>       return function(replicaset, func, args, opts)
>> @@ -444,9 +483,13 @@ local function replicaset_template_multicallro(prefer_replica, balance)
>>           end
>>           local end_time = fiber_clock() + timeout
>>           while not net_status and timeout > 0 do
>> -            replica, err = pick_next_replica(replicaset)
>> +            replica = pick_next_replica(replicaset)
>>               if not replica then
>> -                return nil, err
>> +                replica, timeout = replicaset_wait_master(replicaset, timeout)
>> +                if not replica then
>> +                    return nil, timeout
>> +                end
>> +                replicaset_connect_to_replica(replicaset, replica)
>>               end
>>               opts.timeout = timeout
>>               net_status, storage_status, retval, err =
>> @@ -508,7 +551,128 @@ local function rebind_replicasets(replicasets, old_replicasets)

<...>

>> +    local master_uuid
>> +    for replica_uuid, f in pairs(futures) do
>> +        if timeout < 0 then
>> +            -- Netbox uses cond var inside, whose wait throws an error when gets
>> +            -- a negative timeout. Avoid that.
>> +            timeout = 0
>> +        end
>> +        res, err = f:wait_result(timeout)
>> +        timeout = deadline - fiber_clock()
>> +        if not res then
>> +            f:discard()
>> +            last_err = err
>> +            goto next_wait
> 
> If timeout will be negative we anyway go to next_wait and turn it to 0 at the next iteration.

Yup. This is intentional. With zero timeout the worst what can happen is a
yield. But on the other hand, while we were waiting for one request, the
others could complete and one of them could be the master. This is why I
continue the iteration - don't want to miss any arrived response.

<...>

>> +
>> +local function locate_masters(replicasets)
>> +    local is_all_done = true
>> +    local is_all_nop = true
>> +    local last_err
>> +    for _, replicaset in pairs(replicasets) do
>> +        local is_done, is_nop, err = replicaset_locate_master(replicaset)
> 
> I think we should log result of master discovery. Especially if error occurred.

It is logged by the caller. See master_search_f(), init.lua:1067.

>> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
>> index 5e2a96b..9407ccd 100644
>> --- a/vshard/router/init.lua
>> +++ b/vshard/router/init.lua
>> @@ -1030,6 +1032,93 @@ local function failover_f(router)

<...>

>> +--
>> +-- Master discovery background function. It is supposed to notice master changes
>> +-- and find new masters in the replicasets, which are configured for that.
>> +--
>> +-- XXX: due to polling the search might notice master change not right when it
> Is this TODO related to https://github.com/tarantool/vshard/issues/284 ?

Yes, that is one of the "solutions", although I would rather call it a crutch.
I prefer to fix it by doing fair subscriptions. And for that I at least need a
way to get netbox call result asynchronously, without having to call wait_result()
anywhere.

<...>

>> +local function master_search_set(router)
>> +    local enable = false
>> +    for _, rs in pairs(router.replicasets) do
>> +        if rs.is_auto_master then
>> +            enable = true
>> +            break
>> +        end
>> +    end
>> +    local search_fiber = router.master_search_fiber
>> +    if enable and search_fiber == nil then
>> +        log.info('Master auto search is enabled')
>> +        router.master_search_fiber = util.reloadable_fiber_create(
>> +            'vshard.master_search.' .. router.name, M, 'master_search_f',
>> +            router)
> 
> On 1.10 fiber name is limited with 32 symbols.
> 
> tarantool> #'vshard.master_search.'
> ---
> - 21
> ...
> 
> That mean that router name is limited with 11 chars.
> 
> It could be a problem. As I see you don't use {truncate = true} when assign a name
> 
> https://github.com/tarantool/vshard/blob/a394e3f4008a7436f011c81028dcaadc270eebf6/vshard/util.lua#L93

Holy shit, that is a good find. I fixed it in a new commit on top of the
branch. For all the background fibers at once.

> Also it's possible to wakeup discovery and rebalanser fibers. Probably we should give a way to
> 
> wakeup master_search_fiber.

I didn't want people to rely on that. Because when the subscriptions
are implemented, there won't be a fiber. But I could make it NOP later
probably.

====================
diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result
index 2fca1e4..6dcdf45 100644
--- a/test/router/master_discovery.result
+++ b/test/router/master_discovery.result
@@ -87,7 +87,7 @@ end
 function check_all_masters_found()                                              \
     for _, rs in pairs(vshard.router.static.replicasets) do                     \
         if not rs.master then                                                   \
-            vshard.router.static.master_search_fiber:wakeup()                   \
+            vshard.router.master_search_wakeup()                                \
             return false                                                        \
         end                                                                     \
     end                                                                         \
@@ -101,7 +101,7 @@ function check_master_for_replicaset(rs_id, master_name)
     local master_uuid = util.name_to_uuid[master_name]                          \
     local master = vshard.router.static.replicasets[rs_uuid].master             \
     if not master or master.uuid ~= master_uuid then                            \
-        vshard.router.static.master_search_fiber:wakeup()                       \
+        vshard.router.master_search_wakeup()                                    \
         return false                                                            \
     end                                                                         \
     return true                                                                 \
@@ -124,7 +124,7 @@ master_search_helper_f = nil
  | ...
 function aggressive_master_search_f()                                           \
     while true do                                                               \
-        vshard.router.static.master_search_fiber:wakeup()                       \
+        vshard.router.master_search_wakeup()                                    \
         fiber.sleep(0.001)                                                      \
     end                                                                         \
 end
diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua
index e087f58..653725a 100644
--- a/test/router/master_discovery.test.lua
+++ b/test/router/master_discovery.test.lua
@@ -49,7 +49,7 @@ end
 function check_all_masters_found()                                              \
     for _, rs in pairs(vshard.router.static.replicasets) do                     \
         if not rs.master then                                                   \
-            vshard.router.static.master_search_fiber:wakeup()                   \
+            vshard.router.master_search_wakeup()                                \
             return false                                                        \
         end                                                                     \
     end                                                                         \
@@ -61,7 +61,7 @@ function check_master_for_replicaset(rs_id, master_name)
     local master_uuid = util.name_to_uuid[master_name]                          \
     local master = vshard.router.static.replicasets[rs_uuid].master             \
     if not master or master.uuid ~= master_uuid then                            \
-        vshard.router.static.master_search_fiber:wakeup()                       \
+        vshard.router.master_search_wakeup()                                    \
         return false                                                            \
     end                                                                         \
     return true                                                                 \
@@ -78,7 +78,7 @@ end
 master_search_helper_f = nil
 function aggressive_master_search_f()                                           \
     while true do                                                               \
-        vshard.router.static.master_search_fiber:wakeup()                       \
+        vshard.router.master_search_wakeup()                                    \
         fiber.sleep(0.001)                                                      \
     end                                                                         \
 end
diff --git a/vshard/router/init.lua b/vshard/router/init.lua
index 9407ccd..baaeee4 100644
--- a/vshard/router/init.lua
+++ b/vshard/router/init.lua
@@ -1119,6 +1119,13 @@ local function master_search_set(router)
     end
 end
 
+local function master_search_wakeup(router)
+    local f = router.master_search_fiber
+    if f then
+        f:wakeup()
+    end
+end
+
 --------------------------------------------------------------------------------
 -- Configuration
 --------------------------------------------------------------------------------
@@ -1545,6 +1552,7 @@ local router_mt = {
         bootstrap = cluster_bootstrap;
         bucket_discovery = bucket_discovery;
         discovery_wakeup = discovery_wakeup;
+        master_search_wakeup = master_search_wakeup,
         discovery_set = discovery_set,
         _route_map_clear = route_map_clear,
         _bucket_reset = bucket_reset,
====================

To the commit message I appended the following text:

    Master discovery works in its own fiber on the router, which is
    activated only if at least one replicaset is configured to look
    for the master. It wakes up with a certain period. But it is
    possible to wake it up on demand using
    `vshard.router.master_search_wakeup()` function. It does not do
    anything if master search is not configured for any replicaset.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 6/6] router: update master using a hint from storage
  2021-07-02 11:49   ` Oleg Babin via Tarantool-patches
@ 2021-07-02 21:36     ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-05  9:24       ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-02 21:36 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

Thanks for the review!

>> diff --git a/vshard/error.lua b/vshard/error.lua
>> index e2d1a31..fa7bdaa 100644
>> --- a/vshard/error.lua
>> +++ b/vshard/error.lua
>> @@ -20,7 +20,7 @@ local error_message_template = {
>>       [2] = {
>>           name = 'NON_MASTER',
>>           msg = 'Replica %s is not a master for replicaset %s anymore',
>> -        args = {'replica_uuid', 'replicaset_uuid'}
>> +        args = {'replica_uuid', 'replicaset_uuid', 'master_uuid'}
> Error format string still contains only 2 arguments. Is it ok?

Yes. All the 'args' are remembered as the error object attributes.
I can use it to attach information which I don't want to add to
the message or simply can't.

Here I can't, because master_uuid might be nil. If it would be in
the format, it would be displayed as 'nil' sometimes, which would be
confusing.

>> diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
>> index 7747258..660c786 100644
>> --- a/vshard/replicaset.lua
>> +++ b/vshard/replicaset.lua
>> @@ -570,6 +570,70 @@ local function rebind_replicasets(replicasets, old_replicasets)
>>       end
>>   end
>>   +--
>> +-- Let the replicaset know @a old_master_uuid is not a master anymore, should
>> +-- use @a candidate_uuid instead.
>> +-- Returns whether the request, which brought this information, can be retried.
>> +--
>> +local function replicaset_update_master(replicaset, old_master_uuid,
>> +                                        candidate_uuid)
>> +    local is_auto = replicaset.is_auto_master
>> +    local replicaset_uuid = replicaset.uuid
>> +    if old_master_uuid == candidate_uuid then
>> +        -- It should not happen ever, but be ready to everything.
>> +        log.warn('Replica %s in replicaset %s reports self as both master '..
>> +                 'and not master', master_uuid, replicaset_uuid)
>> +        return is_auto
>> +    end
>> +    local master = replicaset.master
>> +    if not master then
>> +        if not is_auto or not candidate_uuid then
>> +            return is_auto
>> +        end
>> +        local candidate = replicaset.replicas[candidate_uuid]
> 
> AFAIU it means that candidate_uuid doesn't belong to replicaset.
> 
> Why is it true?

It can happen if router config is different from the storage's due
to any reason. If the master is not in router's config, it won't be
able to find it by UUID.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Tarantool-patches] [PATCH vshard 7/6] util: truncate too long fiber name
  2021-07-01 22:09 [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches
                   ` (5 preceding siblings ...)
  2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 6/6] router: update master using a hint from storage Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-02 21:36 ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-05  9:23   ` Oleg Babin via Tarantool-patches
  2021-08-03 21:55 ` [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches
  7 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-02 21:36 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Since tarantool's commit 008658b349264538f76327f8d60f2db5451854ea
("fiber: throw an error on too long fiber name") in version 1.7.6
fiber:name() fails if the name is too long. But the error can be
suppressed by using {truncate = true} option.

The patch makes vshard truncate too long names instead of throwing
an error.
---
 test/unit/util.result   | 10 ++++++++++
 test/unit/util.test.lua |  5 +++++
 vshard/util.lua         |  2 +-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/test/unit/util.result b/test/unit/util.result
index c83e80c..ec61e8a 100644
--- a/test/unit/util.result
+++ b/test/unit/util.result
@@ -71,6 +71,16 @@ test_run:grep_log('default', 'reloadable_function has been started', 1000)
 fib:cancel()
 ---
 ...
+-- Re-loadable fiber must truncate too long name.
+name = string.rep('a', 512)
+---
+...
+fib = util.reloadable_fiber_create(name, fake_M, 'reloadable_function')
+---
+...
+fib:cancel()
+---
+...
 -- Yielding table minus.
 minus_yield = util.table_minus_yield
 ---
diff --git a/test/unit/util.test.lua b/test/unit/util.test.lua
index 881feb4..5432c0d 100644
--- a/test/unit/util.test.lua
+++ b/test/unit/util.test.lua
@@ -28,6 +28,11 @@ while not test_run:grep_log('default', 'module is reloaded, restarting') do fibe
 test_run:grep_log('default', 'reloadable_function has been started', 1000)
 fib:cancel()
 
+-- Re-loadable fiber must truncate too long name.
+name = string.rep('a', 512)
+fib = util.reloadable_fiber_create(name, fake_M, 'reloadable_function')
+fib:cancel()
+
 -- Yielding table minus.
 minus_yield = util.table_minus_yield
 minus_yield({}, {}, 1)
diff --git a/vshard/util.lua b/vshard/util.lua
index 4d2c04e..afa658b 100644
--- a/vshard/util.lua
+++ b/vshard/util.lua
@@ -90,7 +90,7 @@ local function reloadable_fiber_create(fiber_name, module, func_name, data)
     assert(type(fiber_name) == 'string')
     local xfiber = fiber.create(reloadable_fiber_main_loop, module, func_name,
                                 data)
-    xfiber:name(fiber_name)
+    xfiber:name(fiber_name, {truncate = true})
     return xfiber
 end
 
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 4/6] config: introduce master 'auto' replicaset option
  2021-07-02 21:32     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-05  9:23       ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-05  9:23 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your change. LGTM.

On 03.07.2021 00:32, Vladislav Shpilevoy wrote:
> Hi! Thanks for the extensive review!
>
>>> +local function check_replicaset_master(master, ctx)
>>> +    if not lutil.version_is_at_least(1, 10, 0) then
>>> +        error('Only versions >= 1.10 support master discovery')
>>> +    end
>> If we talk about "is_async" netbox option. We should check 1.10.1:
>>
>> https://github.com/tarantool/tarantool/commit/0f686829a89b87d4f8d10fd25d4acfdea9b3dc60#diff-1e545a3af3d3592423eff523367e962e18f7eda45c2f9bef6994a254f6044d70
>>
>> There is a gap between 1.10.0 and 1.10.1 when there is no is_async.
> Good call. It is easy to fix so I did it:
>
> ====================
> @@ -22,8 +22,9 @@ local function check_replica_master(master, ctx)
>   end
>   
>   local function check_replicaset_master(master, ctx)
> -    if not lutil.version_is_at_least(1, 10, 0) then
> -        error('Only versions >= 1.10 support master discovery')
> +    -- Limit the version due to extensive usage of netbox is_async feature.
> +    if not lutil.version_is_at_least(1, 10, 1) then
> +        error('Only versions >= 1.10.1 support master discovery')
>       end
>       if master ~= 'auto' then
>           error('Only "auto" master is supported')

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 7/6] util: truncate too long fiber name
  2021-07-02 21:36 ` [Tarantool-patches] [PATCH vshard 7/6] util: truncate too long fiber name Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-05  9:23   ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-05  9:23 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch! LGTM.

On 03.07.2021 00:36, Vladislav Shpilevoy wrote:
> Since tarantool's commit 008658b349264538f76327f8d60f2db5451854ea
> ("fiber: throw an error on too long fiber name") in version 1.7.6
> fiber:name() fails if the name is too long. But the error can be
> suppressed by using {truncate = true} option.
>
> The patch makes vshard truncate too long names instead of throwing
> an error.
> ---
>   test/unit/util.result   | 10 ++++++++++
>   test/unit/util.test.lua |  5 +++++
>   vshard/util.lua         |  2 +-
>   3 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/test/unit/util.result b/test/unit/util.result
> index c83e80c..ec61e8a 100644
> --- a/test/unit/util.result
> +++ b/test/unit/util.result
> @@ -71,6 +71,16 @@ test_run:grep_log('default', 'reloadable_function has been started', 1000)
>   fib:cancel()
>   ---
>   ...
> +-- Re-loadable fiber must truncate too long name.
> +name = string.rep('a', 512)
> +---
> +...
> +fib = util.reloadable_fiber_create(name, fake_M, 'reloadable_function')
> +---
> +...
> +fib:cancel()
> +---
> +...
>   -- Yielding table minus.
>   minus_yield = util.table_minus_yield
>   ---
> diff --git a/test/unit/util.test.lua b/test/unit/util.test.lua
> index 881feb4..5432c0d 100644
> --- a/test/unit/util.test.lua
> +++ b/test/unit/util.test.lua
> @@ -28,6 +28,11 @@ while not test_run:grep_log('default', 'module is reloaded, restarting') do fibe
>   test_run:grep_log('default', 'reloadable_function has been started', 1000)
>   fib:cancel()
>   
> +-- Re-loadable fiber must truncate too long name.
> +name = string.rep('a', 512)
> +fib = util.reloadable_fiber_create(name, fake_M, 'reloadable_function')
> +fib:cancel()
> +
>   -- Yielding table minus.
>   minus_yield = util.table_minus_yield
>   minus_yield({}, {}, 1)
> diff --git a/vshard/util.lua b/vshard/util.lua
> index 4d2c04e..afa658b 100644
> --- a/vshard/util.lua
> +++ b/vshard/util.lua
> @@ -90,7 +90,7 @@ local function reloadable_fiber_create(fiber_name, module, func_name, data)
>       assert(type(fiber_name) == 'string')
>       local xfiber = fiber.create(reloadable_fiber_main_loop, module, func_name,
>                                   data)
> -    xfiber:name(fiber_name)
> +    xfiber:name(fiber_name, {truncate = true})
>       return xfiber
>   end
>   

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery
  2021-07-02 21:35     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-05  9:24       ` Oleg Babin via Tarantool-patches
  2021-07-05 20:53         ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-05  9:24 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your answers. See my answers below.

On 03.07.2021 00:35, Vladislav Shpilevoy wrote:
> Thanks for the review!
>
>>> This is getting especially tricky with built-in automatic master
>>> elections which are not supported by vshard yet, but they will be,
>>> they won't depend on any config. Master role won't be pinned to
>>> one node then.
>> Is it possible to implement some simple test that shows if RAFT changes
>>
>> a leader, vshard catches this changes? Of course, it will be test for new Tarantool versions.
> Nope. Because the storages do not support it yet. 'Master' attribute
> can only be set via the config now, unfortunately. This will change
> in the future, but it is blocked by one of the bugs (about bucket refs
> being useless on replicas).

Thanks for answer.

>>> diff --git a/vshard/error.lua b/vshard/error.lua
>>> index bcbcd71..e2d1a31 100644
>>> --- a/vshard/error.lua
>>> +++ b/vshard/error.lua
>>> @@ -153,6 +153,11 @@ local error_message_template = {
>>>            name = 'BUCKET_RECV_DATA_ERROR',
>>>            msg = 'Can not receive the bucket %s data in space "%s" at tuple %s: %s',
>>>            args = {'bucket_id', 'space', 'tuple', 'reason'},
>>> +    },
>>> +    [31] = {
>>> +        name = 'MULTIPLE_MASTERS_FOUND',
>>> +        msg = 'Found more than one master in replicaset %s on nodes %s and %s',
>>> +        args = {'replicaset_uuid', 'master1', 'master2'},
>>>        }
>>>    }
>> Is it possible to test this error?
> It is tested, yes. See master_discovery.test.lua. Section
> "-- Multiple masters logging". Although I couldn't find an
> easy way to expose it into vshard.router.info(). So it is
> simply logged now.

Yes, thanks. I've missed it.


>>>    diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
>>> index fa048c9..7747258 100644
>>> --- a/vshard/replicaset.lua
>>> +++ b/vshard/replicaset.lua
>>> @@ -345,18 +377,30 @@ local function replicaset_master_call(replicaset, func, args, opts)
>>>        assert(opts == nil or type(opts) == 'table')
>>>        assert(type(func) == 'string', 'function name')
>>>        assert(args == nil or type(args) == 'table', 'function arguments')
>>> -    local conn, err = replicaset_connect_master(replicaset)
>>> -    if not conn then
>>> -        return nil, err
>>> -    end
>>> -    if not opts then
>>> -        opts = {timeout = replicaset.master.net_timeout}
>>> -    elseif not opts.timeout then
>>> -        opts = table.copy(opts)
>>> -        opts.timeout = replicaset.master.net_timeout
>>> +    local master = replicaset.master
>>> +    if not master then
>>> +        opts = opts and table.copy(opts) or {}
>>> +        if opts.is_async then
>>> +            return nil, lerror.vshard(lerror.code.MISSING_MASTER,
>>> +                                      replicaset.uuid)
>>> +        end
>> Could / should we here wakeup master discovery if "auto" is specified?
> We can. But I am afraid that under considerable RW requests load we
> might wakeup it too often. And if there are many routers, they might
> create a storm of `vshard.storage._call('info')` requests putting a
> notable load on the storages. So I decided it is not safe until the
> subscriptions are implemented.

But we can't simply fail async requests when master is unknown.

And we even don't have exposed 
replicaset_wait_connected/replicaset_wait_master

to call it before master will be known. As result several seconds we 
will get failed requests.

Also I saw TODO in test that "RO requests should be able to go to 
replicas" and I expect questions

about RTO 0 on read requests from our customers.


I don't know will this feature be used by cartridge but it could create 
small inconvenience for our customers

in test because they should inject timeouts until master will be 
discovered (it's too artifical example but there were

some similar questions when discovery started to work by chunks).

>>> @@ -425,11 +469,6 @@ local function replicaset_template_multicallro(prefer_replica, balance)
>>>                    return r
>>>                end
>>>            end
>>> -        local conn, err = replicaset_connect_master(replicaset)
>>> -        if not conn then
>>> -            return nil, err
>>> -        end
>>> -        return master
>>>        end
>> Why don't we need this part anymore?
> Now the caller handles the case when couldn't find any node. See
> the next diff hunk. I moved it outside, because I didn't want
> pick_next_replica() to take a timeout argument. I wanted it to
> be as simple as possible.

Ok. Thanks for answer.


>>>        return function(replicaset, func, args, opts)
>>> @@ -444,9 +483,13 @@ local function replicaset_template_multicallro(prefer_replica, balance)
>>>            end
>>>            local end_time = fiber_clock() + timeout
>>>            while not net_status and timeout > 0 do
>>> -            replica, err = pick_next_replica(replicaset)
>>> +            replica = pick_next_replica(replicaset)
>>>                if not replica then
>>> -                return nil, err
>>> +                replica, timeout = replicaset_wait_master(replicaset, timeout)
>>> +                if not replica then
>>> +                    return nil, timeout
>>> +                end
>>> +                replicaset_connect_to_replica(replicaset, replica)
>>>                end
>>>                opts.timeout = timeout
>>>                net_status, storage_status, retval, err =
>>> @@ -508,7 +551,128 @@ local function rebind_replicasets(replicasets, old_replicasets)
> <...>
>
>>> +    local master_uuid
>>> +    for replica_uuid, f in pairs(futures) do
>>> +        if timeout < 0 then
>>> +            -- Netbox uses cond var inside, whose wait throws an error when gets
>>> +            -- a negative timeout. Avoid that.
>>> +            timeout = 0
>>> +        end
>>> +        res, err = f:wait_result(timeout)
>>> +        timeout = deadline - fiber_clock()
>>> +        if not res then
>>> +            f:discard()
>>> +            last_err = err
>>> +            goto next_wait
>> If timeout will be negative we anyway go to next_wait and turn it to 0 at the next iteration.
> Yup. This is intentional. With zero timeout the worst what can happen is a
> yield. But on the other hand, while we were waiting for one request, the
> others could complete and one of them could be the master. This is why I
> continue the iteration - don't want to miss any arrived response.
Ok. Thanks for answer.
> <...>
>
>>> +
>>> +local function locate_masters(replicasets)
>>> +    local is_all_done = true
>>> +    local is_all_nop = true
>>> +    local last_err
>>> +    for _, replicaset in pairs(replicasets) do
>>> +        local is_done, is_nop, err = replicaset_locate_master(replicaset)
>> I think we should log result of master discovery. Especially if error occurred.
> It is logged by the caller. See master_search_f(), init.lua:1067.

Hmm... Yes, but I think such approach with last_err could lose some errors.

E.g. replicaset.lua:616 gets an error, saves it to `last_err` then if 
some error

happened on furure `wait_result` we override `conn.call` error and 
externally we have

a case when there was no any conn.call error.

>>> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
>>> index 5e2a96b..9407ccd 100644
>>> --- a/vshard/router/init.lua
>>> +++ b/vshard/router/init.lua
>>> @@ -1030,6 +1032,93 @@ local function failover_f(router)
> <...>
>
>>> +--
>>> +-- Master discovery background function. It is supposed to notice master changes
>>> +-- and find new masters in the replicasets, which are configured for that.
>>> +--
>>> +-- XXX: due to polling the search might notice master change not right when it
>> Is this TODO related tohttps://github.com/tarantool/vshard/issues/284  ?
> Yes, that is one of the "solutions", although I would rather call it a crutch.
> I prefer to fix it by doing fair subscriptions. And for that I at least need a
> way to get netbox call result asynchronously, without having to call wait_result()
> anywhere.

Does it request some changes in netbox/iproto? I mean 
https://github.com/tarantool/tarantool/issues/6107


> <...>
>
>>> +local function master_search_set(router)
>>> +    local enable = false
>>> +    for _, rs in pairs(router.replicasets) do
>>> +        if rs.is_auto_master then
>>> +            enable = true
>>> +            break
>>> +        end
>>> +    end
>>> +    local search_fiber = router.master_search_fiber
>>> +    if enable and search_fiber == nil then
>>> +        log.info('Master auto search is enabled')
>>> +        router.master_search_fiber = util.reloadable_fiber_create(
>>> +            'vshard.master_search.' .. router.name, M, 'master_search_f',
>>> +            router)
>> On 1.10 fiber name is limited with 32 symbols.
>>
>> tarantool> #'vshard.master_search.'
>> ---
>> - 21
>> ...
>>
>> That mean that router name is limited with 11 chars.
>>
>> It could be a problem. As I see you don't use {truncate = true} when assign a name
>>
>> https://github.com/tarantool/vshard/blob/a394e3f4008a7436f011c81028dcaadc270eebf6/vshard/util.lua#L93
> Holy shit, that is a good find. I fixed it in a new commit on top of the
> branch. For all the background fibers at once.
>
>> Also it's possible to wakeup discovery and rebalanser fibers. Probably we should give a way to
>>
>> wakeup master_search_fiber.
> I didn't want people to rely on that. Because when the subscriptions
> are implemented, there won't be a fiber. But I could make it NOP later
> probably.
>
> ====================
> diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result
> index 2fca1e4..6dcdf45 100644
> --- a/test/router/master_discovery.result
> +++ b/test/router/master_discovery.result
> @@ -87,7 +87,7 @@ end
>   function check_all_masters_found()                                              \
>       for _, rs in pairs(vshard.router.static.replicasets) do                     \
>           if not rs.master then                                                   \
> -            vshard.router.static.master_search_fiber:wakeup()                   \
> +            vshard.router.master_search_wakeup()                                \
>               return false                                                        \
>           end                                                                     \
>       end                                                                         \
> @@ -101,7 +101,7 @@ function check_master_for_replicaset(rs_id, master_name)
>       local master_uuid = util.name_to_uuid[master_name]                          \
>       local master = vshard.router.static.replicasets[rs_uuid].master             \
>       if not master or master.uuid ~= master_uuid then                            \
> -        vshard.router.static.master_search_fiber:wakeup()                       \
> +        vshard.router.master_search_wakeup()                                    \
>           return false                                                            \
>       end                                                                         \
>       return true                                                                 \
> @@ -124,7 +124,7 @@ master_search_helper_f = nil
>    | ...
>   function aggressive_master_search_f()                                           \
>       while true do                                                               \
> -        vshard.router.static.master_search_fiber:wakeup()                       \
> +        vshard.router.master_search_wakeup()                                    \
>           fiber.sleep(0.001)                                                      \
>       end                                                                         \
>   end
> diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua
> index e087f58..653725a 100644
> --- a/test/router/master_discovery.test.lua
> +++ b/test/router/master_discovery.test.lua
> @@ -49,7 +49,7 @@ end
>   function check_all_masters_found()                                              \
>       for _, rs in pairs(vshard.router.static.replicasets) do                     \
>           if not rs.master then                                                   \
> -            vshard.router.static.master_search_fiber:wakeup()                   \
> +            vshard.router.master_search_wakeup()                                \
>               return false                                                        \
>           end                                                                     \
>       end                                                                         \
> @@ -61,7 +61,7 @@ function check_master_for_replicaset(rs_id, master_name)
>       local master_uuid = util.name_to_uuid[master_name]                          \
>       local master = vshard.router.static.replicasets[rs_uuid].master             \
>       if not master or master.uuid ~= master_uuid then                            \
> -        vshard.router.static.master_search_fiber:wakeup()                       \
> +        vshard.router.master_search_wakeup()                                    \
>           return false                                                            \
>       end                                                                         \
>       return true                                                                 \
> @@ -78,7 +78,7 @@ end
>   master_search_helper_f = nil
>   function aggressive_master_search_f()                                           \
>       while true do                                                               \
> -        vshard.router.static.master_search_fiber:wakeup()                       \
> +        vshard.router.master_search_wakeup()                                    \
>           fiber.sleep(0.001)                                                      \
>       end                                                                         \
>   end
> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
> index 9407ccd..baaeee4 100644
> --- a/vshard/router/init.lua
> +++ b/vshard/router/init.lua
> @@ -1119,6 +1119,13 @@ local function master_search_set(router)
>       end
>   end
>   
> +local function master_search_wakeup(router)
> +    local f = router.master_search_fiber
> +    if f then
> +        f:wakeup()
> +    end
> +end
> +
>   --------------------------------------------------------------------------------
>   -- Configuration
>   --------------------------------------------------------------------------------
> @@ -1545,6 +1552,7 @@ local router_mt = {
>           bootstrap = cluster_bootstrap;
>           bucket_discovery = bucket_discovery;
>           discovery_wakeup = discovery_wakeup;
> +        master_search_wakeup = master_search_wakeup,
>           discovery_set = discovery_set,
>           _route_map_clear = route_map_clear,
>           _bucket_reset = bucket_reset,
> ====================
>
> To the commit message I appended the following text:
>
>      Master discovery works in its own fiber on the router, which is
>      activated only if at least one replicaset is configured to look
>      for the master. It wakes up with a certain period. But it is
>      possible to wake it up on demand using
>      `vshard.router.master_search_wakeup()` function. It does not do
>      anything if master search is not configured for any replicaset.

Thanks for a fix.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 6/6] router: update master using a hint from storage
  2021-07-02 21:36     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-05  9:24       ` Oleg Babin via Tarantool-patches
  2021-07-05 20:53         ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-05  9:24 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your answers.

I have following questions:

   - After changes in previous patch I think it could be better to use 
"master_search_wakeup" instead of master_search_fiber:wakeup()

   - Should we document somehow `update_master` method for replicaset 
object? Or maybe it even shouldn't be a part of public API?

On 03.07.2021 00:36, Vladislav Shpilevoy wrote:
> Thanks for the review!
>
>>> diff --git a/vshard/error.lua b/vshard/error.lua
>>> index e2d1a31..fa7bdaa 100644
>>> --- a/vshard/error.lua
>>> +++ b/vshard/error.lua
>>> @@ -20,7 +20,7 @@ local error_message_template = {
>>>        [2] = {
>>>            name = 'NON_MASTER',
>>>            msg = 'Replica %s is not a master for replicaset %s anymore',
>>> -        args = {'replica_uuid', 'replicaset_uuid'}
>>> +        args = {'replica_uuid', 'replicaset_uuid', 'master_uuid'}
>> Error format string still contains only 2 arguments. Is it ok?
> Yes. All the 'args' are remembered as the error object attributes.
> I can use it to attach information which I don't want to add to
> the message or simply can't.
>
> Here I can't, because master_uuid might be nil. If it would be in
> the format, it would be displayed as 'nil' sometimes, which would be
> confusing.
>
>>> diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
>>> index 7747258..660c786 100644
>>> --- a/vshard/replicaset.lua
>>> +++ b/vshard/replicaset.lua
>>> @@ -570,6 +570,70 @@ local function rebind_replicasets(replicasets, old_replicasets)
>>>        end
>>>    end
>>>    +--
>>> +-- Let the replicaset know @a old_master_uuid is not a master anymore, should
>>> +-- use @a candidate_uuid instead.
>>> +-- Returns whether the request, which brought this information, can be retried.
>>> +--
>>> +local function replicaset_update_master(replicaset, old_master_uuid,
>>> +                                        candidate_uuid)
>>> +    local is_auto = replicaset.is_auto_master
>>> +    local replicaset_uuid = replicaset.uuid
>>> +    if old_master_uuid == candidate_uuid then
>>> +        -- It should not happen ever, but be ready to everything.
>>> +        log.warn('Replica %s in replicaset %s reports self as both master '..
>>> +                 'and not master', master_uuid, replicaset_uuid)
>>> +        return is_auto
>>> +    end
>>> +    local master = replicaset.master
>>> +    if not master then
>>> +        if not is_auto or not candidate_uuid then
>>> +            return is_auto
>>> +        end
>>> +        local candidate = replicaset.replicas[candidate_uuid]
>> AFAIU it means that candidate_uuid doesn't belong to replicaset.
>>
>> Why is it true?
> It can happen if router config is different from the storage's due
> to any reason. If the master is not in router's config, it won't be
> able to find it by UUID.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery
  2021-07-05  9:24       ` Oleg Babin via Tarantool-patches
@ 2021-07-05 20:53         ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-06  8:54           ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-05 20:53 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

>>>>    diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
>>>> index fa048c9..7747258 100644
>>>> --- a/vshard/replicaset.lua
>>>> +++ b/vshard/replicaset.lua
>>>> @@ -345,18 +377,30 @@ local function replicaset_master_call(replicaset, func, args, opts)
>>>>        assert(opts == nil or type(opts) == 'table')
>>>>        assert(type(func) == 'string', 'function name')
>>>>        assert(args == nil or type(args) == 'table', 'function arguments')
>>>> -    local conn, err = replicaset_connect_master(replicaset)
>>>> -    if not conn then
>>>> -        return nil, err
>>>> -    end
>>>> -    if not opts then
>>>> -        opts = {timeout = replicaset.master.net_timeout}
>>>> -    elseif not opts.timeout then
>>>> -        opts = table.copy(opts)
>>>> -        opts.timeout = replicaset.master.net_timeout
>>>> +    local master = replicaset.master
>>>> +    if not master then
>>>> +        opts = opts and table.copy(opts) or {}
>>>> +        if opts.is_async then
>>>> +            return nil, lerror.vshard(lerror.code.MISSING_MASTER,
>>>> +                                      replicaset.uuid)
>>>> +        end
>>> Could / should we here wakeup master discovery if "auto" is specified?
>> We can. But I am afraid that under considerable RW requests load we
>> might wakeup it too often. And if there are many routers, they might
>> create a storm of `vshard.storage._call('info')` requests putting a
>> notable load on the storages. So I decided it is not safe until the
>> subscriptions are implemented.
> 
> But we can't simply fail async requests when master is unknown.

There is not much of a choice for now. You will fail them even if the master
is here, but disconnected. Because netbox fails is_async immediately if the
connection is not in active nor in fetch_schema states.

If it must be solved, I would better do it in a generic way for both
situations, separately. But it is not trivial. Because you start getting a
concept of is_async + timeout so as the requests wouldn't hang there for too
long time overflowing the queue of them. This means you would need a heap of
requests sorted by their deadlines, and you would flush them in on_connect
trigger, which AFAIR not suitable for making calls somewhy which is a next
issue (but I might be wrong).

I wouldn't want to do that all in scope of master discovery ticket.

> And we even don't have exposed replicaset_wait_connected/replicaset_wait_master
> 
> to call it before master will be known. As result several seconds we will get failed requests.

I didn't expose them yet because I am very reluctant towards exposing
anything until there is an explicit request from a user. Don't want to
overload the API.

> Also I saw TODO in test that "RO requests should be able to go to replicas" and I expect questions
> 
> about RTO 0 on read requests from our customers.

What do you mean as "RTO 0 on read requests"?

> I don't know will this feature be used by cartridge but it could create small inconvenience for our customers
> 
> in test because they should inject timeouts until master will be discovered (it's too artifical example but there were
> 
> some similar questions when discovery started to work by chunks).
> >>>> +local function locate_masters(replicasets)
>>>> +    local is_all_done = true
>>>> +    local is_all_nop = true
>>>> +    local last_err
>>>> +    for _, replicaset in pairs(replicasets) do
>>>> +        local is_done, is_nop, err = replicaset_locate_master(replicaset)
>>> I think we should log result of master discovery. Especially if error occurred.
>> It is logged by the caller. See master_search_f(), init.lua:1067.
> 
> Hmm... Yes, but I think such approach with last_err could lose some errors.

Yes, that is also done on purpose. I don't want to clog the logs with too
many errors. For instance, imagine you have 100 replicasets with 3 nodes in
each, each is polled with 0.5s. And there is no network. Then it can produce
about 600 errors per second. I tried to make the clogging less severe by showing
only the last error. And the idea is that you fix these errors one by one until
the log has no errors and discovery works fine.

For example, firstly you fix the network if none of the futures was created. Then
you fix storage.call('info') being not existing due to the old version, and etc.
In the log you will see the next error to fix.

I also wanted not to log the same error too often. For example, check that the
message didn't change in the master search fiber, and log it only on change
or once per, say, 10 seconds. But decided to wait if it explodes anywhere before
I complicate the code even more.

Do you think it is not needed? I am not sure, and can switch to logging of
everything.

>>>> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
>>>> index 5e2a96b..9407ccd 100644
>>>> --- a/vshard/router/init.lua
>>>> +++ b/vshard/router/init.lua
>>>> @@ -1030,6 +1032,93 @@ local function failover_f(router)
>> <...>
>>
>>>> +--
>>>> +-- Master discovery background function. It is supposed to notice master changes
>>>> +-- and find new masters in the replicasets, which are configured for that.
>>>> +--
>>>> +-- XXX: due to polling the search might notice master change not right when it
>>> Is this TODO related tohttps://github.com/tarantool/vshard/issues/284  ?
>> Yes, that is one of the "solutions", although I would rather call it a crutch.
>> I prefer to fix it by doing fair subscriptions. And for that I at least need a
>> way to get netbox call result asynchronously, without having to call wait_result()
>> anywhere.
> 
> Does it request some changes in netbox/iproto? I mean https://github.com/tarantool/tarantool/issues/6107

6107 is not about the protocol changes. It is only about
the interface on the server and in netbox. For vshard it
would be enough to patch netbox. I want to be able to do
something like that:

	local subscribe_complete

	local function subscribe_push(conn, msg)
		log.info('Push: %s', msg)
		-- ...
	end

	local function subscribe_start(conn)
		conn:call('subscribe', {
			is_async = true,
			on_complete = subscribe_complete,
			on_complete_ctx = conn,
			on_push = subscribe_push,
			on_push_ctx = conn,
		})
	end

	local function subscribe_complete(conn, res)
		log.info('Subscription is done: %s, retry', res)
		-- ...
		subscribe_start(conn)
	end

	subscribe_start(conn)

'Subscribe' in vshard would be a function on the storage
which returns when the storage's state changes anyhow. I could
manage these subscriptions on the client in a very cheap way
without fibers. Then I could make multiple subscriptions or
one big sub for all storage's details. Wouldn't matter. And
wouldn't cost almost anything on the storage too if 6107
would be done.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 6/6] router: update master using a hint from storage
  2021-07-05  9:24       ` Oleg Babin via Tarantool-patches
@ 2021-07-05 20:53         ` Vladislav Shpilevoy via Tarantool-patches
  2021-07-06  8:55           ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-05 20:53 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

>   - After changes in previous patch I think it could be better to use "master_search_wakeup" instead of master_search_fiber:wakeup()

Yes, I forgot to update this commit. Done now:

====================
diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result
index 8416c8f..7ebb67d 100644
--- a/test/router/master_discovery.result
+++ b/test/router/master_discovery.result
@@ -115,7 +115,7 @@ function check_no_master_for_replicaset(rs_id)
     if not master then                                                          \
         return true                                                             \
     end                                                                         \
-    vshard.router.static.master_search_fiber:wakeup()                           \
+    vshard.router.master_search_wakeup()                                        \
     return false                                                                \
 end
  | ---
@@ -170,7 +170,7 @@ function check_master_discovery_block()
     if vshard.router.internal.errinj.ERRINJ_MASTER_SEARCH_DELAY == 'in' then    \
         return true                                                             \
     end                                                                         \
-    vshard.router.static.master_search_fiber:wakeup()                           \
+    vshard.router.master_search_wakeup()                                        \
     return false                                                                \
 end
  | ---
diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua
index 9a9e60d..6276dc9 100644
--- a/test/router/master_discovery.test.lua
+++ b/test/router/master_discovery.test.lua
@@ -73,7 +73,7 @@ function check_no_master_for_replicaset(rs_id)
     if not master then                                                          \
         return true                                                             \
     end                                                                         \
-    vshard.router.static.master_search_fiber:wakeup()                           \
+    vshard.router.master_search_wakeup()                                        \
     return false                                                                \
 end
 
@@ -114,7 +114,7 @@ function check_master_discovery_block()
     if vshard.router.internal.errinj.ERRINJ_MASTER_SEARCH_DELAY == 'in' then    \
         return true                                                             \
     end                                                                         \
-    vshard.router.static.master_search_fiber:wakeup()                           \
+    vshard.router.master_search_wakeup()                                        \
     return false                                                                \
 end
 
====================

>   - Should we document somehow `update_master` method for replicaset object? Or maybe it even shouldn't be a part of public API?

Nope. It is a private method. If something is not documented on the
site, it is private. I couldn't come up with a better scheme and yet
not harm vshard's own code readability.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery
  2021-07-05 20:53         ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-06  8:54           ` Oleg Babin via Tarantool-patches
  2021-07-06 21:19             ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-06  8:54 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your answers. Patch LGTM. I also asked Yaroslav and I hope he 
will take a look.

On 05.07.2021 23:53, Vladislav Shpilevoy wrote:
>>>>>     diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
>>>>> index fa048c9..7747258 100644
>>>>> --- a/vshard/replicaset.lua
>>>>> +++ b/vshard/replicaset.lua
>>>>> @@ -345,18 +377,30 @@ local function replicaset_master_call(replicaset, func, args, opts)
>>>>>         assert(opts == nil or type(opts) == 'table')
>>>>>         assert(type(func) == 'string', 'function name')
>>>>>         assert(args == nil or type(args) == 'table', 'function arguments')
>>>>> -    local conn, err = replicaset_connect_master(replicaset)
>>>>> -    if not conn then
>>>>> -        return nil, err
>>>>> -    end
>>>>> -    if not opts then
>>>>> -        opts = {timeout = replicaset.master.net_timeout}
>>>>> -    elseif not opts.timeout then
>>>>> -        opts = table.copy(opts)
>>>>> -        opts.timeout = replicaset.master.net_timeout
>>>>> +    local master = replicaset.master
>>>>> +    if not master then
>>>>> +        opts = opts and table.copy(opts) or {}
>>>>> +        if opts.is_async then
>>>>> +            return nil, lerror.vshard(lerror.code.MISSING_MASTER,
>>>>> +                                      replicaset.uuid)
>>>>> +        end
>>>> Could / should we here wakeup master discovery if "auto" is specified?
>>> We can. But I am afraid that under considerable RW requests load we
>>> might wakeup it too often. And if there are many routers, they might
>>> create a storm of `vshard.storage._call('info')` requests putting a
>>> notable load on the storages. So I decided it is not safe until the
>>> subscriptions are implemented.
>> But we can't simply fail async requests when master is unknown.
> There is not much of a choice for now. You will fail them even if the master
> is here, but disconnected. Because netbox fails is_async immediately if the
> connection is not in active nor in fetch_schema states.
>
> If it must be solved, I would better do it in a generic way for both
> situations, separately. But it is not trivial. Because you start getting a
> concept of is_async + timeout so as the requests wouldn't hang there for too
> long time overflowing the queue of them. This means you would need a heap of
> requests sorted by their deadlines, and you would flush them in on_connect
> trigger, which AFAIR not suitable for making calls somewhy which is a next
> issue (but I might be wrong).
>
> I wouldn't want to do that all in scope of master discovery ticket.

Yes, I agree. I know about error in case when netbox is not connected.

Ok. Let's it work in the same way as netbox async call.


>
>> And we even don't have exposed replicaset_wait_connected/replicaset_wait_master
>>
>> to call it before master will be known. As result several seconds we will get failed requests.
> I didn't expose them yet because I am very reluctant towards exposing
> anything until there is an explicit request from a user. Don't want to
> overload the API.
>
>> Also I saw TODO in test that "RO requests should be able to go to replicas" and I expect questions
>>
>> about RTO 0 on read requests from our customers.
> What do you mean as "RTO 0 on read requests"?
>
We faced it when developed CRUD module. To perform requests we need to 
know space schema.

But all remote schema changes are visible only after request that 
triggers schema refetch.

And we had working cluster but the first portion of requests failed 
because of outdated schema.


Will we have something similar when we doesn't discover master yet but 
have replicas and our

read requests will fail? If it's so please file at least an issue to fix it.

Considering definition itself RTO means "recovery time objective".


>> I don't know will this feature be used by cartridge but it could create small inconvenience for our customers
>>
>> in test because they should inject timeouts until master will be discovered (it's too artifical example but there were
>>
>> some similar questions when discovery started to work by chunks).
>>>>>> +local function locate_masters(replicasets)
>>>>> +    local is_all_done = true
>>>>> +    local is_all_nop = true
>>>>> +    local last_err
>>>>> +    for _, replicaset in pairs(replicasets) do
>>>>> +        local is_done, is_nop, err = replicaset_locate_master(replicaset)
>>>> I think we should log result of master discovery. Especially if error occurred.
>>> It is logged by the caller. See master_search_f(), init.lua:1067.
>> Hmm... Yes, but I think such approach with last_err could lose some errors.
> Yes, that is also done on purpose. I don't want to clog the logs with too
> many errors. For instance, imagine you have 100 replicasets with 3 nodes in
> each, each is polled with 0.5s. And there is no network. Then it can produce
> about 600 errors per second. I tried to make the clogging less severe by showing
> only the last error. And the idea is that you fix these errors one by one until
> the log has no errors and discovery works fine.
>
> For example, firstly you fix the network if none of the futures was created. Then
> you fix storage.call('info') being not existing due to the old version, and etc.
> In the log you will see the next error to fix.
>
> I also wanted not to log the same error too often. For example, check that the
> message didn't change in the master search fiber, and log it only on change
> or once per, say, 10 seconds. But decided to wait if it explodes anywhere before
> I complicate the code even more.
>
> Do you think it is not needed? I am not sure, and can switch to logging of
> everything.

Yes, sounds reasonable. Feel free to ignore following thoughts I just 
try to say my idea.

t's not good to spam logs with error. But maybe we could aggregate

error messages and print after each step. I mean e.g. we have a step 
where we call

`pcall(conn.call, conn, func, args, async_opts)` we can save uris or 
uuids of bad hosts and

print them in single message.


>>>>> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
>>>>> index 5e2a96b..9407ccd 100644
>>>>> --- a/vshard/router/init.lua
>>>>> +++ b/vshard/router/init.lua
>>>>> @@ -1030,6 +1032,93 @@ local function failover_f(router)
>>> <...>
>>>
>>>>> +--
>>>>> +-- Master discovery background function. It is supposed to notice master changes
>>>>> +-- and find new masters in the replicasets, which are configured for that.
>>>>> +--
>>>>> +-- XXX: due to polling the search might notice master change not right when it
>>>> Is this TODO related tohttps://github.com/tarantool/vshard/issues/284  ?
>>> Yes, that is one of the "solutions", although I would rather call it a crutch.
>>> I prefer to fix it by doing fair subscriptions. And for that I at least need a
>>> way to get netbox call result asynchronously, without having to call wait_result()
>>> anywhere.
>> Does it request some changes in netbox/iproto? I mean https://github.com/tarantool/tarantool/issues/6107
> 6107 is not about the protocol changes. It is only about
> the interface on the server and in netbox. For vshard it
> would be enough to patch netbox. I want to be able to do
> something like that:
>
> 	local subscribe_complete
>
> 	local function subscribe_push(conn, msg)
> 		log.info('Push: %s', msg)
> 		-- ...
> 	end
>
> 	local function subscribe_start(conn)
> 		conn:call('subscribe', {
> 			is_async = true,
> 			on_complete = subscribe_complete,
> 			on_complete_ctx = conn,
> 			on_push = subscribe_push,
> 			on_push_ctx = conn,
> 		})
> 	end
>
> 	local function subscribe_complete(conn, res)
> 		log.info('Subscription is done: %s, retry', res)
> 		-- ...
> 		subscribe_start(conn)
> 	end
>
> 	subscribe_start(conn)
>
> 'Subscribe' in vshard would be a function on the storage
> which returns when the storage's state changes anyhow. I could
> manage these subscriptions on the client in a very cheap way
> without fibers. Then I could make multiple subscriptions or
> one big sub for all storage's details. Wouldn't matter. And
> wouldn't cost almost anything on the storage too if 6107
> would be done.

I did something similar for one project. "subscribe" reads from special 
channel and

performs box.session.push(). Ok, I hope it will be done later.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 6/6] router: update master using a hint from storage
  2021-07-05 20:53         ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-07-06  8:55           ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-07-06  8:55 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your changes. LGTM.

On 05.07.2021 23:53, Vladislav Shpilevoy wrote:
>>    - After changes in previous patch I think it could be better to use "master_search_wakeup" instead of master_search_fiber:wakeup()
> Yes, I forgot to update this commit. Done now:
>
> ====================
> diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result
> index 8416c8f..7ebb67d 100644
> --- a/test/router/master_discovery.result
> +++ b/test/router/master_discovery.result
> @@ -115,7 +115,7 @@ function check_no_master_for_replicaset(rs_id)
>       if not master then                                                          \
>           return true                                                             \
>       end                                                                         \
> -    vshard.router.static.master_search_fiber:wakeup()                           \
> +    vshard.router.master_search_wakeup()                                        \
>       return false                                                                \
>   end
>    | ---
> @@ -170,7 +170,7 @@ function check_master_discovery_block()
>       if vshard.router.internal.errinj.ERRINJ_MASTER_SEARCH_DELAY == 'in' then    \
>           return true                                                             \
>       end                                                                         \
> -    vshard.router.static.master_search_fiber:wakeup()                           \
> +    vshard.router.master_search_wakeup()                                        \
>       return false                                                                \
>   end
>    | ---
> diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua
> index 9a9e60d..6276dc9 100644
> --- a/test/router/master_discovery.test.lua
> +++ b/test/router/master_discovery.test.lua
> @@ -73,7 +73,7 @@ function check_no_master_for_replicaset(rs_id)
>       if not master then                                                          \
>           return true                                                             \
>       end                                                                         \
> -    vshard.router.static.master_search_fiber:wakeup()                           \
> +    vshard.router.master_search_wakeup()                                        \
>       return false                                                                \
>   end
>   
> @@ -114,7 +114,7 @@ function check_master_discovery_block()
>       if vshard.router.internal.errinj.ERRINJ_MASTER_SEARCH_DELAY == 'in' then    \
>           return true                                                             \
>       end                                                                         \
> -    vshard.router.static.master_search_fiber:wakeup()                           \
> +    vshard.router.master_search_wakeup()                                        \
>       return false                                                                \
>   end
>   
> ====================
>
>>    - Should we document somehow `update_master` method for replicaset object? Or maybe it even shouldn't be a part of public API?
> Nope. It is a private method. If something is not documented on the
> site, it is private. I couldn't come up with a better scheme and yet
> not harm vshard's own code readability.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery
  2021-07-06  8:54           ` Oleg Babin via Tarantool-patches
@ 2021-07-06 21:19             ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 0 replies; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-07-06 21:19 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

>>> And we even don't have exposed replicaset_wait_connected/replicaset_wait_master
>>>
>>> to call it before master will be known. As result several seconds we will get failed requests.
>> I didn't expose them yet because I am very reluctant towards exposing
>> anything until there is an explicit request from a user. Don't want to
>> overload the API.
>>
>>> Also I saw TODO in test that "RO requests should be able to go to replicas" and I expect questions
>>>
>>> about RTO 0 on read requests from our customers.
>> What do you mean as "RTO 0 on read requests"?
>>
> We faced it when developed CRUD module. To perform requests we need to know space schema.
> 
> But all remote schema changes are visible only after request that triggers schema refetch.
> 
> And we had working cluster but the first portion of requests failed because of outdated schema.
> 
> 
> Will we have something similar when we doesn't discover master yet but have replicas and our
> 
> read requests will fail? If it's so please file at least an issue to fix it.

Yes, I understand now. Right, you might have failures, I am afraid. Depending on
your timeout. If there is no master, your request will try to wait for a master's
appearance during the given timeout. It won't fail immediately.

But sometimes it does not even need to wait - it can simply go to a working replica
right away if there is one. I mentioned it in XXX comments in the patch. I created
a ticket: https://github.com/tarantool/vshard/issues/288.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 0/6] Master discovery
  2021-07-01 22:09 [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches
                   ` (6 preceding siblings ...)
  2021-07-02 21:36 ` [Tarantool-patches] [PATCH vshard 7/6] util: truncate too long fiber name Vladislav Shpilevoy via Tarantool-patches
@ 2021-08-03 21:55 ` Vladislav Shpilevoy via Tarantool-patches
  7 siblings, 0 replies; 27+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-08-03 21:55 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Pushed to master.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-08-03 21:55 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-01 22:09 [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches
2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 1/6] replicaset: introduce netbox_wait_connected() Vladislav Shpilevoy via Tarantool-patches
2021-07-02 11:46   ` Oleg Babin via Tarantool-patches
2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 2/6] test: sort some table prints Vladislav Shpilevoy via Tarantool-patches
2021-07-02 11:46   ` Oleg Babin via Tarantool-patches
2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 3/6] storage: introduce vshard.storage._call('info') Vladislav Shpilevoy via Tarantool-patches
2021-07-02 11:46   ` Oleg Babin via Tarantool-patches
2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 4/6] config: introduce master 'auto' replicaset option Vladislav Shpilevoy via Tarantool-patches
2021-07-02 11:47   ` Oleg Babin via Tarantool-patches
2021-07-02 21:32     ` Vladislav Shpilevoy via Tarantool-patches
2021-07-05  9:23       ` Oleg Babin via Tarantool-patches
2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery Vladislav Shpilevoy via Tarantool-patches
2021-07-02 11:48   ` Oleg Babin via Tarantool-patches
2021-07-02 21:35     ` Vladislav Shpilevoy via Tarantool-patches
2021-07-05  9:24       ` Oleg Babin via Tarantool-patches
2021-07-05 20:53         ` Vladislav Shpilevoy via Tarantool-patches
2021-07-06  8:54           ` Oleg Babin via Tarantool-patches
2021-07-06 21:19             ` Vladislav Shpilevoy via Tarantool-patches
2021-07-01 22:09 ` [Tarantool-patches] [PATCH vshard 6/6] router: update master using a hint from storage Vladislav Shpilevoy via Tarantool-patches
2021-07-02 11:49   ` Oleg Babin via Tarantool-patches
2021-07-02 21:36     ` Vladislav Shpilevoy via Tarantool-patches
2021-07-05  9:24       ` Oleg Babin via Tarantool-patches
2021-07-05 20:53         ` Vladislav Shpilevoy via Tarantool-patches
2021-07-06  8:55           ` Oleg Babin via Tarantool-patches
2021-07-02 21:36 ` [Tarantool-patches] [PATCH vshard 7/6] util: truncate too long fiber name Vladislav Shpilevoy via Tarantool-patches
2021-07-05  9:23   ` Oleg Babin via Tarantool-patches
2021-08-03 21:55 ` [Tarantool-patches] [PATCH vshard 0/6] Master discovery Vladislav Shpilevoy via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox