From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 2D9B86EC40; Fri, 2 Jul 2021 14:48:23 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 2D9B86EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1625226503; bh=vYa+ypBJvm8LptjP48XkRp7VXcuzvH5rCaFeD+b7mls=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=YbKAAmgV123PVSbmrHgAVfzLkF0kj24D9v6HSPAfBggGLQZRDwKai6eWdIvNXZAoJ yOka8MYPsnX91R+D7sJIpuKsmxxuCr/PnqjzgU9dm3oPeeKuBn29WZvvsHUMVtpdlx hAAnfFc00MGjF0YIS6WOERjbsDubftE68xX1XgmQ= Received: from smtp52.i.mail.ru (smtp52.i.mail.ru [94.100.177.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id A288A6EC40 for ; Fri, 2 Jul 2021 14:48:21 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A288A6EC40 Received: by smtp52.i.mail.ru with esmtpa (envelope-from ) id 1lzHee-0006kv-FY; Fri, 02 Jul 2021 14:48:21 +0300 To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, yaroslav.dynnikov@tarantool.org References: <6d8c2a728366edf5b0d208aeed9e027f870aa699.1625177222.git.v.shpilevoy@tarantool.org> Message-ID: Date: Fri, 2 Jul 2021 14:48:20 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <6d8c2a728366edf5b0d208aeed9e027f870aa699.1625177222.git.v.shpilevoy@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-GB X-4EC0790: 10 X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD954DFF1DC42D673FB703477AD6D36A6E3C7EF2853D6A1C7C1182A05F5380850406A0CDEFAAD67FD179CEBD0DF914787B677E5E039CED2C51E85B23B497A4CF4DC X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7444CB0504BAF4550EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637995C329E4A8C93938638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8969AAE4262F173E2D4EA0B7C1EB0D3E4117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18C26CFBAC0749D213D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EE7B96B19DC4093321A9539A8B24243104D8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE3A6C7FFFE744CA7FB03F1AB874ED89028C4224003CC836476EA7A3FFF5B025636E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F7900637B8F435DEDE9E76EBEFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A24209795067102C07E8F7B195E1C97831F92DC2230794572D5E2DD01FEC9EDF0A X-C1DE0DAB: 0D63561A33F958A559D49673CA676B41E22E7CF712EAD1194C6B44CC55A9807ED59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA753753CEE10E4ED4A7410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34829444FF2D8CB89B1380A7FAA937712C8F043FBAE26C32490CDC5EE7BC62D82D68B4472294769F511D7E09C32AA3244CD6733C27E10FEEE0604AD1857AAA77649CA7333006C390A0FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2bioj/yEXjM+E11CgtsvkVxwWqQ== X-Mailru-Sender: 583F1D7ACE8F49BD07526C4546A62CBF89A3E83B36E098E038EB2613D7508009B3FAD1186888A28C23E75C7104EB1B885DEE61814008E47C7013064206BFB89F93956FB04BA385BE9437F6177E88F7363CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Oleg Babin via Tarantool-patches Reply-To: Oleg Babin Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Thanks for your patch. See 8 comments below. On 02.07.2021 01:09, Vladislav Shpilevoy wrote: > Part of #75 > > @TarantoolBot document > Title: VShard router master discovery > > Router used not to be able to find master nodes in the configured > replicasets on its own. It relied only on how they were specified > in the config. > > This becomes a problem when master changes and the change is not > delivered to the router's config somewhy. For instance, the router > does not rely on a central config provider. Or it does rely, but > the provider can't deliver a new config due to any reason. > > This is getting especially tricky with built-in automatic master > elections which are not supported by vshard yet, but they will be, > they won't depend on any config. Master role won't be pinned to > one node then. Is it possible to implement some simple test that shows if RAFT changes a leader, vshard catches this changes? Of course, it will be test for new Tarantool versions. > Now there is a new feature to overcome the master search problem - > configurable automatic master discovery on the router. > > Router goes to the replicasets, marked as having an auto master, > finds a master in them, and periodically checks if the master is > still a master. > > When a master in a replicaset stops being a master, the router > walks all nodes of the replicaset and finds who is the new master. > > To turn the feature on there is a new option in router's config: > `master = 'auto'`. It should be specified per-replicaset, and is > not compatible with specifying a master manually. > > This is how a good config looks: > ``` > config = { > sharding = { > = { > master = 'auto', > replicas = {...}, > }, > ... > }, > ... > } > ``` > > This is how a bad config looks: > ``` > config = { > sharding = { > = { > master = 'auto', > replicas = { > = { > master = true, > ... > }, > = { > master = false, > ... > }, > }, > }, > ... > }, > ... > } > ``` > It will not work, because either `master = 'auto'` can be > specified, or the master is assigned manually. Not both at the > same time. > --- > test/router/master_discovery.result | 514 ++++++++++++++++++++++++++ > test/router/master_discovery.test.lua | 248 +++++++++++++ > vshard/consts.lua | 5 + > vshard/error.lua | 5 + > vshard/replicaset.lua | 203 +++++++++- > vshard/router/init.lua | 98 ++++- > 6 files changed, 1051 insertions(+), 22 deletions(-) > create mode 100644 test/router/master_discovery.result > create mode 100644 test/router/master_discovery.test.lua > > > > diff --git a/vshard/consts.lua b/vshard/consts.lua > index 47a893b..66a09ae 100644 > --- a/vshard/consts.lua > +++ b/vshard/consts.lua > @@ -52,6 +52,11 @@ return { > DISCOVERY_WORK_STEP = 0.01, > DISCOVERY_TIMEOUT = 10, > > + MASTER_SEARCH_IDLE_INTERVAL = 5, > + MASTER_SEARCH_WORK_INTERVAL = 0.5, > + MASTER_SEARCH_BACKOFF_INTERVAL = 5, > + MASTER_SEARCH_TIEMOUT = 5, > + > TIMEOUT_INFINITY = 500 * 365 * 86400, > DEADLINE_INFINITY = math.huge, > } > diff --git a/vshard/error.lua b/vshard/error.lua > index bcbcd71..e2d1a31 100644 > --- a/vshard/error.lua > +++ b/vshard/error.lua > @@ -153,6 +153,11 @@ local error_message_template = { > name = 'BUCKET_RECV_DATA_ERROR', > msg = 'Can not receive the bucket %s data in space "%s" at tuple %s: %s', > args = {'bucket_id', 'space', 'tuple', 'reason'}, > + }, > + [31] = { > + name = 'MULTIPLE_MASTERS_FOUND', > + msg = 'Found more than one master in replicaset %s on nodes %s and %s', > + args = {'replicaset_uuid', 'master1', 'master2'}, > } > } Is it possible to test this error? > > diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua > index fa048c9..7747258 100644 > --- a/vshard/replicaset.lua > +++ b/vshard/replicaset.lua > @@ -25,6 +25,10 @@ > -- } > -- }, > -- master = , > +-- master_cond = +-- changes its master>, > +-- is_auto_master = +-- its own>, > -- replica = , > -- balance_i = -- use for a load-balanced request>, > @@ -55,6 +59,8 @@ local luuid = require('uuid') > local ffi = require('ffi') > local util = require('vshard.util') > local fiber_clock = fiber.clock > +local fiber_yield = fiber.yield > +local fiber_cond_wait = util.fiber_cond_wait > local gsc = util.generate_self_checker > > -- > @@ -159,6 +165,26 @@ local function replicaset_connect_to_replica(replicaset, replica) > return conn > end > > +local function replicaset_wait_master(replicaset, timeout) > + local master = replicaset.master > + -- Fast path - master is almost always known. > + if master then > + return master, timeout > + end > + -- Slow path. > + local deadline = fiber_clock() + timeout > + repeat > + if not replicaset.is_auto_master or > + not fiber_cond_wait(replicaset.master_cond, timeout) then > + return nil, lerror.vshard(lerror.code.MISSING_MASTER, > + replicaset.uuid) > + end > + timeout = deadline - fiber_clock() > + master = replicaset.master > + until master > + return master, timeout > +end > + > -- > -- Create net.box connection to master. > -- > @@ -175,7 +201,13 @@ end > -- Wait until the master instance is connected. > -- > local function replicaset_wait_connected(replicaset, timeout) > - return netbox_wait_connected(replicaset_connect_master(replicaset), timeout) > + local master > + master, timeout = replicaset_wait_master(replicaset, timeout) > + if not master then > + return nil, timeout > + end > + local conn = replicaset_connect_to_replica(replicaset, master) > + return netbox_wait_connected(conn, timeout) > end > > -- > @@ -345,18 +377,30 @@ local function replicaset_master_call(replicaset, func, args, opts) > assert(opts == nil or type(opts) == 'table') > assert(type(func) == 'string', 'function name') > assert(args == nil or type(args) == 'table', 'function arguments') > - local conn, err = replicaset_connect_master(replicaset) > - if not conn then > - return nil, err > - end > - if not opts then > - opts = {timeout = replicaset.master.net_timeout} > - elseif not opts.timeout then > - opts = table.copy(opts) > - opts.timeout = replicaset.master.net_timeout > + local master = replicaset.master > + if not master then > + opts = opts and table.copy(opts) or {} > + if opts.is_async then > + return nil, lerror.vshard(lerror.code.MISSING_MASTER, > + replicaset.uuid) > + end Could / should we here wakeup master discovery if "auto" is specified? > + local timeout = opts.timeout or consts.MASTER_SEARCH_TIEMOUT > + master, timeout = replicaset_wait_master(replicaset, timeout) > + if not master then > + return nil, timeout > + end > + opts.timeout = timeout > + else > + if not opts then > + opts = {timeout = master.net_timeout} > + elseif not opts.timeout then > + opts = table.copy(opts) > + opts.timeout = master.net_timeout > + end > end > + replicaset_connect_to_replica(replicaset, master) > local net_status, storage_status, retval, error_object = > - replica_call(replicaset.master, func, args, opts) > + replica_call(master, func, args, opts) > -- Ignore net_status - master does not retry requests. > return storage_status, retval, error_object > end > @@ -425,11 +469,6 @@ local function replicaset_template_multicallro(prefer_replica, balance) > return r > end > end > - local conn, err = replicaset_connect_master(replicaset) > - if not conn then > - return nil, err > - end > - return master > end Why don't we need this part anymore? > return function(replicaset, func, args, opts) > @@ -444,9 +483,13 @@ local function replicaset_template_multicallro(prefer_replica, balance) > end > local end_time = fiber_clock() + timeout > while not net_status and timeout > 0 do > - replica, err = pick_next_replica(replicaset) > + replica = pick_next_replica(replicaset) > if not replica then > - return nil, err > + replica, timeout = replicaset_wait_master(replicaset, timeout) > + if not replica then > + return nil, timeout > + end > + replicaset_connect_to_replica(replicaset, replica) > end > opts.timeout = timeout > net_status, storage_status, retval, err = > @@ -508,7 +551,128 @@ local function rebind_replicasets(replicasets, old_replicasets) > end > end > end > + if old_replicaset then > + -- Take a hint from the old replicaset who is the master now. > + if replicaset.is_auto_master then > + local master = old_replicaset.master > + if master then > + replicaset.master = replicaset.replicas[master.uuid] > + end > + end > + -- Stop waiting for master in the old replicaset. Its running > + -- requests won't find it anyway. Auto search works only for the > + -- most actual replicaset objects. > + if old_replicaset.is_auto_master then > + old_replicaset.is_auto_master = false > + old_replicaset.master_cond:broadcast() > + end > + end > + end > +end > + > +-- > +-- Check if the master is still master, and find a new master if there is no a > +-- known one. > +-- > +local function replicaset_locate_master(replicaset) > + local is_done = true > + local is_nop = true > + if not replicaset.is_auto_master then > + return is_done, is_nop > + end > + local func = 'vshard.storage._call' > + local args = {'info'} > + local const_timeout = consts.MASTER_SEARCH_TIEMOUT > + local sync_opts = {timeout = const_timeout} > + local ok, res, err, f > + local master = replicaset.master > + if master then > + ok, res, err = replica_call(master, func, args, sync_opts) > + if not ok then > + return is_done, is_nop, err > + end > + if res.is_master then > + return is_done, is_nop > + end > + log.info('Master of replicaset %s, node %s, has resigned. Trying '.. > + 'to find a new one', replicaset.uuid, master.uuid) > + replicaset.master = nil > + end > + is_nop = false > + > + local last_err > + local futures = {} > + local timeout = const_timeout > + local deadline = fiber_clock() + timeout > + local async_opts = {is_async = true} > + local replicaset_uuid = replicaset.uuid > + for replica_uuid, replica in pairs(replicaset.replicas) do > + local conn = replica.conn > + timeout, err = netbox_wait_connected(conn, timeout) > + if not timeout then > + last_err = err > + timeout = deadline - fiber_clock() > + else > + ok, f = pcall(conn.call, conn, func, args, async_opts) > + if not ok then > + last_err = lerror.make(f) > + else > + futures[replica_uuid] = f > + end > + end > + end > + local master_uuid > + for replica_uuid, f in pairs(futures) do > + if timeout < 0 then > + -- Netbox uses cond var inside, whose wait throws an error when gets > + -- a negative timeout. Avoid that. > + timeout = 0 > + end > + res, err = f:wait_result(timeout) > + timeout = deadline - fiber_clock() > + if not res then > + f:discard() > + last_err = err > + goto next_wait If timeout will be negative we anyway go to next_wait and turn it to 0 at the next iteration. > + end > + res = res[1] > + if not res.is_master then > + goto next_wait > + end > + if not master_uuid then > + master_uuid = replica_uuid > + goto next_wait > + end > + is_done = false > + last_err = lerror.vshard(lerror.code.MULTIPLE_MASTERS_FOUND, > + replicaset_uuid, master_uuid, replica_uuid) > + do return is_done, is_nop, last_err end > + ::next_wait:: > + end > + master = replicaset.replicas[master_uuid] > + if master then > + log.info('Found master for replicaset %s: %s', replicaset_uuid, > + master_uuid) > + replicaset.master = master > + replicaset.master_cond:broadcast() > + else > + is_done = false > + end > + return is_done, is_nop, last_err > +end > + > +local function locate_masters(replicasets) > + local is_all_done = true > + local is_all_nop = true > + local last_err > + for _, replicaset in pairs(replicasets) do > + local is_done, is_nop, err = replicaset_locate_master(replicaset) I think we should log result of master discovery. Especially if error occurred. > + is_all_done = is_all_done and is_done > + is_all_nop = is_all_nop and is_nop > + last_err = err or last_err > + fiber_yield() > end > + return is_all_done, is_all_nop, last_err > end > > -- > @@ -729,6 +893,8 @@ local function buildall(sharding_cfg) > bucket_count = 0, > lock = replicaset.lock, > balance_i = 1, > + is_auto_master = replicaset.master == 'auto', > + master_cond = fiber.cond(), > }, replicaset_mt) > local priority_list = {} > for replica_uuid, replica in pairs(replicaset.replicas) do > @@ -802,4 +968,5 @@ return { > wait_masters_connect = wait_masters_connect, > rebind_replicasets = rebind_replicasets, > outdate_replicasets = outdate_replicasets, > + locate_masters = locate_masters, > } > diff --git a/vshard/router/init.lua b/vshard/router/init.lua > index 5e2a96b..9407ccd 100644 > --- a/vshard/router/init.lua > +++ b/vshard/router/init.lua > @@ -68,6 +68,8 @@ local ROUTER_TEMPLATE = { > replicasets = nil, > -- Fiber to maintain replica connections. > failover_fiber = nil, > + -- Fiber to watch for master changes and find new masters. > + master_search_fiber = nil, > -- Fiber to discovery buckets in background. > discovery_fiber = nil, > -- How discovery works. On - work infinitely. Off - no > @@ -1030,6 +1032,93 @@ local function failover_f(router) > end > end > > +-------------------------------------------------------------------------------- > +-- Master search > +-------------------------------------------------------------------------------- > + > +local function master_search_step(router) > + local ok, is_done, is_nop, err = pcall(lreplicaset.locate_masters, > + router.replicasets) > + if not ok then > + err = is_done > + is_done = false > + is_nop = false > + end > + return is_done, is_nop, err > +end > + > +-- > +-- Master discovery background function. It is supposed to notice master changes > +-- and find new masters in the replicasets, which are configured for that. > +-- > +-- XXX: due to polling the search might notice master change not right when it Is this TODO related to https://github.com/tarantool/vshard/issues/284 ? > +-- happens. In future it makes sense to rewrite master search using > +-- subscriptions. The problem is that at the moment of writing the subscriptions > +-- are not working well in all Tarantool versions. > +-- > +local function master_search_f(router) > + local module_version = M.module_version > + local is_in_progress = false > + while module_version == M.module_version do > + local timeout > + local start_time = fiber_clock() > + local is_done, is_nop, err = master_search_step(router) > + if err then > + log.error('Error during master search: %s', lerror.make(err)) > + end > + if is_done then > + timeout = consts.MASTER_SEARCH_IDLE_INTERVAL > + elseif err then > + timeout = consts.MASTER_SEARCH_BACKOFF_INTERVAL > + else > + timeout = consts.MASTER_SEARCH_WORK_INTERVAL > + end > + if not is_in_progress then > + if not is_nop and is_done then > + log.info('Master search happened') > + elseif not is_done then > + log.info('Master search is started') > + is_in_progress = true > + end > + elseif is_done then > + log.info('Master search is finished') > + is_in_progress = false > + end > + local end_time = fiber_clock() > + local duration = end_time - start_time > + if not is_nop then > + log.verbose('Master search step took %s seconds. Next in %s '.. > + 'seconds', duration, timeout) > + end > + lfiber.sleep(timeout) > + end > +end > + > +local function master_search_set(router) > + local enable = false > + for _, rs in pairs(router.replicasets) do > + if rs.is_auto_master then > + enable = true > + break > + end > + end > + local search_fiber = router.master_search_fiber > + if enable and search_fiber == nil then > + log.info('Master auto search is enabled') > + router.master_search_fiber = util.reloadable_fiber_create( > + 'vshard.master_search.' .. router.name, M, 'master_search_f', > + router) On 1.10 fiber name is limited with 32 symbols. tarantool> #'vshard.master_search.' --- - 21 ... That mean that router name is limited with 11 chars. It could be a problem. As I see you don't use {truncate = true} when assign a name https://github.com/tarantool/vshard/blob/a394e3f4008a7436f011c81028dcaadc270eebf6/vshard/util.lua#L93 Also it's possible to wakeup discovery and rebalanser fibers. Probably we should give a way to wakeup master_search_fiber. > + elseif not enable and search_fiber ~= nil then > + -- Do not make users pay for what they do not use - when the search is > + -- disabled for all replicasets, there should not be any fiber. > + log.info('Master auto search is disabled') > + if search_fiber:status() ~= 'dead' then > + search_fiber:cancel() > + end > + router.master_search_fiber = nil > + end > +end > + > -------------------------------------------------------------------------------- > -- Configuration > -------------------------------------------------------------------------------- > @@ -1100,6 +1189,7 @@ local function router_cfg(router, cfg, is_reload) > 'vshard.failover.' .. router.name, M, 'failover_f', router) > end > discovery_set(router, vshard_cfg.discovery_mode) > + master_search_set(router) > end > > -------------------------------------------------------------------------------- > @@ -1535,6 +1625,10 @@ end > -------------------------------------------------------------------------------- > -- Module definition > -------------------------------------------------------------------------------- > +M.discovery_f = discovery_f > +M.failover_f = failover_f > +M.master_search_f = master_search_f > +M.router_mt = router_mt > -- > -- About functions, saved in M, and reloading see comment in > -- storage/init.lua. > @@ -1556,10 +1650,6 @@ else > M.module_version = M.module_version + 1 > end > > -M.discovery_f = discovery_f > -M.failover_f = failover_f > -M.router_mt = router_mt > - > module.cfg = legacy_cfg > module.new = router_new > module.internal = M