From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id A44026EC40; Fri, 2 Jul 2021 01:12:12 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org A44026EC40 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1625177532; bh=y6czumNQe0rHgarPZBcNk2WomhOmsPtQ0c1OFlx261I=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=pH09amwvPDOIYIApyjNgikRs+CysLrufiE2Ed37Pr8tMisSPxs8gWb+sdhPW+T0Ru +PYS68rmrjvGHkyv3qzX7s4GhF+j9xbl8Nfuvv3KdBw+8uAJycuXMfb82Asai4shAG 13rtATWKPFbgvmfrQfCjUiK0SXidRdEZOGy+6YRo= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 799F16EC5D for ; Fri, 2 Jul 2021 01:09:43 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 799F16EC5D Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1lz4sQ-00061t-8r; Fri, 02 Jul 2021 01:09:42 +0300 To: tarantool-patches@dev.tarantool.org, olegrok@tarantool.org, yaroslav.dynnikov@tarantool.org Date: Fri, 2 Jul 2021 00:09:35 +0200 Message-Id: <6d8c2a728366edf5b0d208aeed9e027f870aa699.1625177222.git.v.shpilevoy@tarantool.org> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD954DFF1DC42D673FB96E19CC2B9345E2B1F8975EC27617E56182A05F53808504020C38630495D527D1EC67F64F0CBC1468F3EC9EFB512459D148BCBDDFEF1B10A X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE75AD53DF1D86BACA3EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637AA32F0A5ADCF96E68638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D81364760D4F43D39AADC88789F7DF7D06117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC2EE5AD8F952D28FBA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD186FD1C55BDD38FC3FD2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6A1DCCEB63E2F10FB089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A2AD77751E876CB595E8F7B195E1C97831B2FCC5B1B5B115E58955918E15BC64BA X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C4C7A0BC55FA0FE5FC552C9033C69F36190A0BB32E3C8924E8C079C181102E97FFB1881A6453793CE9C32612AADDFBE061C61BE10805914D3804EBA3D8E7E5B87ABF8C51168CD8EBDB3D2201D7125A9A9FDC48ACC2A39D04F89CDFB48F4795C241BDAD6C7F3747799A X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D346840168BCAD8054E63B00934332F01D24F029CAEB99A9DEC05444F95CBB78CDA8DA5CC1AD99499011D7E09C32AA3244C9252A0B6F0294A41E53EC1FC9E4BB2E53A76366E8A9DE7CAFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojbL9S8ysBdXjw2yJDrod5MDfOtWPCGr3S X-Mailru-Sender: 689FA8AB762F73936BC43F508A063822EBF79294C4180E8D096A1AA35DB558A83841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: [Tarantool-patches] [PATCH vshard 5/6] router: introduce automatic master discovery X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Part of #75 @TarantoolBot document Title: VShard router master discovery Router used not to be able to find master nodes in the configured replicasets on its own. It relied only on how they were specified in the config. This becomes a problem when master changes and the change is not delivered to the router's config somewhy. For instance, the router does not rely on a central config provider. Or it does rely, but the provider can't deliver a new config due to any reason. This is getting especially tricky with built-in automatic master elections which are not supported by vshard yet, but they will be, they won't depend on any config. Master role won't be pinned to one node then. Now there is a new feature to overcome the master search problem - configurable automatic master discovery on the router. Router goes to the replicasets, marked as having an auto master, finds a master in them, and periodically checks if the master is still a master. When a master in a replicaset stops being a master, the router walks all nodes of the replicaset and finds who is the new master. To turn the feature on there is a new option in router's config: `master = 'auto'`. It should be specified per-replicaset, and is not compatible with specifying a master manually. This is how a good config looks: ``` config = { sharding = { = { master = 'auto', replicas = {...}, }, ... }, ... } ``` This is how a bad config looks: ``` config = { sharding = { = { master = 'auto', replicas = { = { master = true, ... }, = { master = false, ... }, }, }, ... }, ... } ``` It will not work, because either `master = 'auto'` can be specified, or the master is assigned manually. Not both at the same time. --- test/router/master_discovery.result | 514 ++++++++++++++++++++++++++ test/router/master_discovery.test.lua | 248 +++++++++++++ vshard/consts.lua | 5 + vshard/error.lua | 5 + vshard/replicaset.lua | 203 +++++++++- vshard/router/init.lua | 98 ++++- 6 files changed, 1051 insertions(+), 22 deletions(-) create mode 100644 test/router/master_discovery.result create mode 100644 test/router/master_discovery.test.lua diff --git a/test/router/master_discovery.result b/test/router/master_discovery.result new file mode 100644 index 0000000..2fca1e4 --- /dev/null +++ b/test/router/master_discovery.result @@ -0,0 +1,514 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... +REPLICASET_1 = { 'storage_1_a', 'storage_1_b' } + | --- + | ... +REPLICASET_2 = { 'storage_2_a', 'storage_2_b' } + | --- + | ... +test_run:create_cluster(REPLICASET_1, 'router') + | --- + | ... +test_run:create_cluster(REPLICASET_2, 'router') + | --- + | ... +util = require('util') + | --- + | ... +util.wait_master(test_run, REPLICASET_1, 'storage_1_a') + | --- + | ... +util.wait_master(test_run, REPLICASET_2, 'storage_2_a') + | --- + | ... +util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memtx\')') + | --- + | ... +util.push_rs_filters(test_run) + | --- + | ... +_ = test_run:cmd("create server router_1 with script='router/router_1.lua'") + | --- + | ... +_ = test_run:cmd("start server router_1") + | --- + | ... + +-- +-- gh-75: automatic master discovery on router. +-- + +_ = test_run:switch("router_1") + | --- + | ... +util = require('util') + | --- + | ... +vshard.router.bootstrap() + | --- + | - true + | ... + +for _, rs in pairs(cfg.sharding) do \ + for _, r in pairs(rs.replicas) do \ + r.master = nil \ + end \ +end \ + +function enable_auto_masters() \ + for _, rs in pairs(cfg.sharding) do \ + rs.master = 'auto' \ + end \ + vshard.router.cfg(cfg) \ +end + | --- + | ... + +function disable_auto_masters() \ + for _, rs in pairs(cfg.sharding) do \ + rs.master = nil \ + end \ + vshard.router.cfg(cfg) \ +end + | --- + | ... + +-- But do not forget the buckets. Otherwise bucket discovery will establish +-- the connections instead of external requests. +function forget_masters() \ + disable_auto_masters() \ + enable_auto_masters() \ +end + | --- + | ... + +function check_all_masters_found() \ + for _, rs in pairs(vshard.router.static.replicasets) do \ + if not rs.master then \ + vshard.router.static.master_search_fiber:wakeup() \ + return false \ + end \ + end \ + return true \ +end + | --- + | ... + +function check_master_for_replicaset(rs_id, master_name) \ + local rs_uuid = util.replicasets[rs_id] \ + local master_uuid = util.name_to_uuid[master_name] \ + local master = vshard.router.static.replicasets[rs_uuid].master \ + if not master or master.uuid ~= master_uuid then \ + vshard.router.static.master_search_fiber:wakeup() \ + return false \ + end \ + return true \ +end + | --- + | ... + +function check_all_buckets_found() \ + if vshard.router.info().bucket.unknown == 0 then \ + return true \ + end \ + vshard.router.discovery_wakeup() \ + return false \ +end + | --- + | ... + +master_search_helper_f = nil + | --- + | ... +function aggressive_master_search_f() \ + while true do \ + vshard.router.static.master_search_fiber:wakeup() \ + fiber.sleep(0.001) \ + end \ +end + | --- + | ... + +function start_aggressive_master_search() \ + assert(master_search_helper_f == nil) \ + master_search_helper_f = fiber.new(aggressive_master_search_f) \ + master_search_helper_f:set_joinable(true) \ +end + | --- + | ... + +function stop_aggressive_master_search() \ + assert(master_search_helper_f ~= nil) \ + master_search_helper_f:cancel() \ + master_search_helper_f:join() \ + master_search_helper_f = nil \ +end + | --- + | ... + +-- +-- Simulate the first cfg when no masters are known. +-- +forget_masters() + | --- + | ... +assert(vshard.router.static.master_search_fiber ~= nil) + | --- + | - true + | ... +test_run:wait_cond(check_all_masters_found) + | --- + | - true + | ... +test_run:wait_cond(check_all_buckets_found) + | --- + | - true + | ... + +-- +-- Change master and see how router finds it again. +-- +test_run:switch('storage_1_a') + | --- + | - true + | ... +replicas = cfg.sharding[util.replicasets[1]].replicas + | --- + | ... +replicas[util.name_to_uuid.storage_1_a].master = false + | --- + | ... +replicas[util.name_to_uuid.storage_1_b].master = true + | --- + | ... +vshard.storage.cfg(cfg, instance_uuid) + | --- + | ... + +test_run:switch('storage_1_b') + | --- + | - true + | ... +replicas = cfg.sharding[util.replicasets[1]].replicas + | --- + | ... +replicas[util.name_to_uuid.storage_1_a].master = false + | --- + | ... +replicas[util.name_to_uuid.storage_1_b].master = true + | --- + | ... +vshard.storage.cfg(cfg, instance_uuid) + | --- + | ... + +test_run:switch('router_1') + | --- + | - true + | ... +big_timeout = 1000000 + | --- + | ... +opts_big_timeout = {timeout = big_timeout} + | --- + | ... +test_run:wait_cond(function() \ + return check_master_for_replicaset(1, 'storage_1_b') \ +end) + | --- + | - true + | ... +vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) + | --- + | - 1 + | ... + +test_run:switch('storage_1_b') + | --- + | - true + | ... +assert(echo_count == 1) + | --- + | - true + | ... +echo_count = 0 + | --- + | ... + +-- +-- Revert the master back. +-- +test_run:switch('storage_1_a') + | --- + | - true + | ... +replicas = cfg.sharding[util.replicasets[1]].replicas + | --- + | ... +replicas[util.name_to_uuid.storage_1_a].master = true + | --- + | ... +replicas[util.name_to_uuid.storage_1_b].master = false + | --- + | ... +vshard.storage.cfg(cfg, instance_uuid) + | --- + | ... + +test_run:switch('storage_1_b') + | --- + | - true + | ... +replicas = cfg.sharding[util.replicasets[1]].replicas + | --- + | ... +replicas[util.name_to_uuid.storage_1_a].master = true + | --- + | ... +replicas[util.name_to_uuid.storage_1_b].master = false + | --- + | ... +vshard.storage.cfg(cfg, instance_uuid) + | --- + | ... + +test_run:switch('router_1') + | --- + | - true + | ... +test_run:wait_cond(function() \ + return check_master_for_replicaset(1, 'storage_1_a') \ +end) + | --- + | - true + | ... + +-- +-- Call tries to wait for master if has enough time left. +-- +start_aggressive_master_search() + | --- + | ... + +forget_masters() + | --- + | ... +vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) + | --- + | - 1 + | ... + +forget_masters() + | --- + | ... +-- XXX: this should not depend on master so much. RO requests should be able to +-- go to replicas. +vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) + | --- + | - 1 + | ... + +forget_masters() + | --- + | ... +vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout) + | --- + | - 1 + | - null + | - null + | ... + +forget_masters() + | --- + | ... +-- XXX: the same as above - should not really wait for master. Regardless of it +-- being auto or not. +vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) + | --- + | - 1 + | - null + | - null + | ... + +stop_aggressive_master_search() + | --- + | ... + +test_run:switch('storage_1_a') + | --- + | - true + | ... +assert(echo_count == 4) + | --- + | - true + | ... +echo_count = 0 + | --- + | ... + +-- +-- Old replicaset objects stop waiting for master when search is disabled. +-- + +-- Turn off masters on the first replicaset. +replicas = cfg.sharding[util.replicasets[1]].replicas + | --- + | ... +replicas[util.name_to_uuid.storage_1_a].master = false + | --- + | ... +vshard.storage.cfg(cfg, instance_uuid) + | --- + | ... + +test_run:switch('storage_1_b') + | --- + | - true + | ... +replicas = cfg.sharding[util.replicasets[1]].replicas + | --- + | ... +replicas[util.name_to_uuid.storage_1_a].master = false + | --- + | ... +vshard.storage.cfg(cfg, instance_uuid) + | --- + | ... + +-- Try to make RW and RO requests but then turn of the auto search. +test_run:switch('router_1') + | --- + | - true + | ... +forget_masters() + | --- + | ... +f1 = fiber.create(function() \ + fiber.self():set_joinable(true) \ + return vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) \ +end) + | --- + | ... +-- XXX: should not really wait for master since this is an RO request. It could +-- use a replica. +f2 = fiber.create(function() \ + fiber.self():set_joinable(true) \ + return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ +end) + | --- + | ... +fiber.sleep(0.01) + | --- + | ... +disable_auto_masters() + | --- + | ... +f1:join() + | --- + | - true + | - null + | - code: 6 + | type: ShardingError + | name: MISSING_MASTER + | replicaset_uuid: + | message: Master is not configured for replicaset + | ... +f2:join() + | --- + | - true + | - null + | - code: 6 + | type: ShardingError + | name: MISSING_MASTER + | replicaset_uuid: + | message: Master is not configured for replicaset + | ... + +-- +-- Multiple masters logging. +-- +test_run:switch('storage_1_a') + | --- + | - true + | ... +replicas = cfg.sharding[util.replicasets[1]].replicas + | --- + | ... +replicas[util.name_to_uuid.storage_1_a].master = true + | --- + | ... +vshard.storage.cfg(cfg, instance_uuid) + | --- + | ... + +test_run:switch('storage_1_b') + | --- + | - true + | ... +replicas = cfg.sharding[util.replicasets[1]].replicas + | --- + | ... +replicas[util.name_to_uuid.storage_1_b].master = true + | --- + | ... +vshard.storage.cfg(cfg, instance_uuid) + | --- + | ... + +test_run:switch('router_1') + | --- + | - true + | ... +forget_masters() + | --- + | ... +start_aggressive_master_search() + | --- + | ... +test_run:wait_log('router_1', 'Found more than one master', nil, 10) + | --- + | - Found more than one master + | ... +stop_aggressive_master_search() + | --- + | ... + +-- +-- Async request won't wait for master. Otherwise it would need to wait, which +-- is not async behaviour. The timeout should be ignored. +-- +do \ + forget_masters() \ + return vshard.router.callrw(1501, 'echo', {1}, { \ + is_async = true, timeout = big_timeout \ + }) \ +end + | --- + | - null + | - code: 6 + | type: ShardingError + | name: MISSING_MASTER + | replicaset_uuid: + | message: Master is not configured for replicaset + | ... + +_ = test_run:switch("default") + | --- + | ... +_ = test_run:cmd("stop server router_1") + | --- + | ... +_ = test_run:cmd("cleanup server router_1") + | --- + | ... +test_run:drop_cluster(REPLICASET_1) + | --- + | ... +test_run:drop_cluster(REPLICASET_2) + | --- + | ... +_ = test_run:cmd('clear filter') + | --- + | ... diff --git a/test/router/master_discovery.test.lua b/test/router/master_discovery.test.lua new file mode 100644 index 0000000..e087f58 --- /dev/null +++ b/test/router/master_discovery.test.lua @@ -0,0 +1,248 @@ +test_run = require('test_run').new() +REPLICASET_1 = { 'storage_1_a', 'storage_1_b' } +REPLICASET_2 = { 'storage_2_a', 'storage_2_b' } +test_run:create_cluster(REPLICASET_1, 'router') +test_run:create_cluster(REPLICASET_2, 'router') +util = require('util') +util.wait_master(test_run, REPLICASET_1, 'storage_1_a') +util.wait_master(test_run, REPLICASET_2, 'storage_2_a') +util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memtx\')') +util.push_rs_filters(test_run) +_ = test_run:cmd("create server router_1 with script='router/router_1.lua'") +_ = test_run:cmd("start server router_1") + +-- +-- gh-75: automatic master discovery on router. +-- + +_ = test_run:switch("router_1") +util = require('util') +vshard.router.bootstrap() + +for _, rs in pairs(cfg.sharding) do \ + for _, r in pairs(rs.replicas) do \ + r.master = nil \ + end \ +end \ + +function enable_auto_masters() \ + for _, rs in pairs(cfg.sharding) do \ + rs.master = 'auto' \ + end \ + vshard.router.cfg(cfg) \ +end + +function disable_auto_masters() \ + for _, rs in pairs(cfg.sharding) do \ + rs.master = nil \ + end \ + vshard.router.cfg(cfg) \ +end + +-- But do not forget the buckets. Otherwise bucket discovery will establish +-- the connections instead of external requests. +function forget_masters() \ + disable_auto_masters() \ + enable_auto_masters() \ +end + +function check_all_masters_found() \ + for _, rs in pairs(vshard.router.static.replicasets) do \ + if not rs.master then \ + vshard.router.static.master_search_fiber:wakeup() \ + return false \ + end \ + end \ + return true \ +end + +function check_master_for_replicaset(rs_id, master_name) \ + local rs_uuid = util.replicasets[rs_id] \ + local master_uuid = util.name_to_uuid[master_name] \ + local master = vshard.router.static.replicasets[rs_uuid].master \ + if not master or master.uuid ~= master_uuid then \ + vshard.router.static.master_search_fiber:wakeup() \ + return false \ + end \ + return true \ +end + +function check_all_buckets_found() \ + if vshard.router.info().bucket.unknown == 0 then \ + return true \ + end \ + vshard.router.discovery_wakeup() \ + return false \ +end + +master_search_helper_f = nil +function aggressive_master_search_f() \ + while true do \ + vshard.router.static.master_search_fiber:wakeup() \ + fiber.sleep(0.001) \ + end \ +end + +function start_aggressive_master_search() \ + assert(master_search_helper_f == nil) \ + master_search_helper_f = fiber.new(aggressive_master_search_f) \ + master_search_helper_f:set_joinable(true) \ +end + +function stop_aggressive_master_search() \ + assert(master_search_helper_f ~= nil) \ + master_search_helper_f:cancel() \ + master_search_helper_f:join() \ + master_search_helper_f = nil \ +end + +-- +-- Simulate the first cfg when no masters are known. +-- +forget_masters() +assert(vshard.router.static.master_search_fiber ~= nil) +test_run:wait_cond(check_all_masters_found) +test_run:wait_cond(check_all_buckets_found) + +-- +-- Change master and see how router finds it again. +-- +test_run:switch('storage_1_a') +replicas = cfg.sharding[util.replicasets[1]].replicas +replicas[util.name_to_uuid.storage_1_a].master = false +replicas[util.name_to_uuid.storage_1_b].master = true +vshard.storage.cfg(cfg, instance_uuid) + +test_run:switch('storage_1_b') +replicas = cfg.sharding[util.replicasets[1]].replicas +replicas[util.name_to_uuid.storage_1_a].master = false +replicas[util.name_to_uuid.storage_1_b].master = true +vshard.storage.cfg(cfg, instance_uuid) + +test_run:switch('router_1') +big_timeout = 1000000 +opts_big_timeout = {timeout = big_timeout} +test_run:wait_cond(function() \ + return check_master_for_replicaset(1, 'storage_1_b') \ +end) +vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) + +test_run:switch('storage_1_b') +assert(echo_count == 1) +echo_count = 0 + +-- +-- Revert the master back. +-- +test_run:switch('storage_1_a') +replicas = cfg.sharding[util.replicasets[1]].replicas +replicas[util.name_to_uuid.storage_1_a].master = true +replicas[util.name_to_uuid.storage_1_b].master = false +vshard.storage.cfg(cfg, instance_uuid) + +test_run:switch('storage_1_b') +replicas = cfg.sharding[util.replicasets[1]].replicas +replicas[util.name_to_uuid.storage_1_a].master = true +replicas[util.name_to_uuid.storage_1_b].master = false +vshard.storage.cfg(cfg, instance_uuid) + +test_run:switch('router_1') +test_run:wait_cond(function() \ + return check_master_for_replicaset(1, 'storage_1_a') \ +end) + +-- +-- Call tries to wait for master if has enough time left. +-- +start_aggressive_master_search() + +forget_masters() +vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) + +forget_masters() +-- XXX: this should not depend on master so much. RO requests should be able to +-- go to replicas. +vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) + +forget_masters() +vshard.router.route(1501):callrw('echo', {1}, opts_big_timeout) + +forget_masters() +-- XXX: the same as above - should not really wait for master. Regardless of it +-- being auto or not. +vshard.router.route(1501):callro('echo', {1}, opts_big_timeout) + +stop_aggressive_master_search() + +test_run:switch('storage_1_a') +assert(echo_count == 4) +echo_count = 0 + +-- +-- Old replicaset objects stop waiting for master when search is disabled. +-- + +-- Turn off masters on the first replicaset. +replicas = cfg.sharding[util.replicasets[1]].replicas +replicas[util.name_to_uuid.storage_1_a].master = false +vshard.storage.cfg(cfg, instance_uuid) + +test_run:switch('storage_1_b') +replicas = cfg.sharding[util.replicasets[1]].replicas +replicas[util.name_to_uuid.storage_1_a].master = false +vshard.storage.cfg(cfg, instance_uuid) + +-- Try to make RW and RO requests but then turn of the auto search. +test_run:switch('router_1') +forget_masters() +f1 = fiber.create(function() \ + fiber.self():set_joinable(true) \ + return vshard.router.callrw(1501, 'echo', {1}, opts_big_timeout) \ +end) +-- XXX: should not really wait for master since this is an RO request. It could +-- use a replica. +f2 = fiber.create(function() \ + fiber.self():set_joinable(true) \ + return vshard.router.callro(1501, 'echo', {1}, opts_big_timeout) \ +end) +fiber.sleep(0.01) +disable_auto_masters() +f1:join() +f2:join() + +-- +-- Multiple masters logging. +-- +test_run:switch('storage_1_a') +replicas = cfg.sharding[util.replicasets[1]].replicas +replicas[util.name_to_uuid.storage_1_a].master = true +vshard.storage.cfg(cfg, instance_uuid) + +test_run:switch('storage_1_b') +replicas = cfg.sharding[util.replicasets[1]].replicas +replicas[util.name_to_uuid.storage_1_b].master = true +vshard.storage.cfg(cfg, instance_uuid) + +test_run:switch('router_1') +forget_masters() +start_aggressive_master_search() +test_run:wait_log('router_1', 'Found more than one master', nil, 10) +stop_aggressive_master_search() + +-- +-- Async request won't wait for master. Otherwise it would need to wait, which +-- is not async behaviour. The timeout should be ignored. +-- +do \ + forget_masters() \ + return vshard.router.callrw(1501, 'echo', {1}, { \ + is_async = true, timeout = big_timeout \ + }) \ +end + +_ = test_run:switch("default") +_ = test_run:cmd("stop server router_1") +_ = test_run:cmd("cleanup server router_1") +test_run:drop_cluster(REPLICASET_1) +test_run:drop_cluster(REPLICASET_2) +_ = test_run:cmd('clear filter') diff --git a/vshard/consts.lua b/vshard/consts.lua index 47a893b..66a09ae 100644 --- a/vshard/consts.lua +++ b/vshard/consts.lua @@ -52,6 +52,11 @@ return { DISCOVERY_WORK_STEP = 0.01, DISCOVERY_TIMEOUT = 10, + MASTER_SEARCH_IDLE_INTERVAL = 5, + MASTER_SEARCH_WORK_INTERVAL = 0.5, + MASTER_SEARCH_BACKOFF_INTERVAL = 5, + MASTER_SEARCH_TIEMOUT = 5, + TIMEOUT_INFINITY = 500 * 365 * 86400, DEADLINE_INFINITY = math.huge, } diff --git a/vshard/error.lua b/vshard/error.lua index bcbcd71..e2d1a31 100644 --- a/vshard/error.lua +++ b/vshard/error.lua @@ -153,6 +153,11 @@ local error_message_template = { name = 'BUCKET_RECV_DATA_ERROR', msg = 'Can not receive the bucket %s data in space "%s" at tuple %s: %s', args = {'bucket_id', 'space', 'tuple', 'reason'}, + }, + [31] = { + name = 'MULTIPLE_MASTERS_FOUND', + msg = 'Found more than one master in replicaset %s on nodes %s and %s', + args = {'replicaset_uuid', 'master1', 'master2'}, } } diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua index fa048c9..7747258 100644 --- a/vshard/replicaset.lua +++ b/vshard/replicaset.lua @@ -25,6 +25,10 @@ -- } -- }, -- master = , +-- master_cond = , +-- is_auto_master = , -- replica = , -- balance_i = , @@ -55,6 +59,8 @@ local luuid = require('uuid') local ffi = require('ffi') local util = require('vshard.util') local fiber_clock = fiber.clock +local fiber_yield = fiber.yield +local fiber_cond_wait = util.fiber_cond_wait local gsc = util.generate_self_checker -- @@ -159,6 +165,26 @@ local function replicaset_connect_to_replica(replicaset, replica) return conn end +local function replicaset_wait_master(replicaset, timeout) + local master = replicaset.master + -- Fast path - master is almost always known. + if master then + return master, timeout + end + -- Slow path. + local deadline = fiber_clock() + timeout + repeat + if not replicaset.is_auto_master or + not fiber_cond_wait(replicaset.master_cond, timeout) then + return nil, lerror.vshard(lerror.code.MISSING_MASTER, + replicaset.uuid) + end + timeout = deadline - fiber_clock() + master = replicaset.master + until master + return master, timeout +end + -- -- Create net.box connection to master. -- @@ -175,7 +201,13 @@ end -- Wait until the master instance is connected. -- local function replicaset_wait_connected(replicaset, timeout) - return netbox_wait_connected(replicaset_connect_master(replicaset), timeout) + local master + master, timeout = replicaset_wait_master(replicaset, timeout) + if not master then + return nil, timeout + end + local conn = replicaset_connect_to_replica(replicaset, master) + return netbox_wait_connected(conn, timeout) end -- @@ -345,18 +377,30 @@ local function replicaset_master_call(replicaset, func, args, opts) assert(opts == nil or type(opts) == 'table') assert(type(func) == 'string', 'function name') assert(args == nil or type(args) == 'table', 'function arguments') - local conn, err = replicaset_connect_master(replicaset) - if not conn then - return nil, err - end - if not opts then - opts = {timeout = replicaset.master.net_timeout} - elseif not opts.timeout then - opts = table.copy(opts) - opts.timeout = replicaset.master.net_timeout + local master = replicaset.master + if not master then + opts = opts and table.copy(opts) or {} + if opts.is_async then + return nil, lerror.vshard(lerror.code.MISSING_MASTER, + replicaset.uuid) + end + local timeout = opts.timeout or consts.MASTER_SEARCH_TIEMOUT + master, timeout = replicaset_wait_master(replicaset, timeout) + if not master then + return nil, timeout + end + opts.timeout = timeout + else + if not opts then + opts = {timeout = master.net_timeout} + elseif not opts.timeout then + opts = table.copy(opts) + opts.timeout = master.net_timeout + end end + replicaset_connect_to_replica(replicaset, master) local net_status, storage_status, retval, error_object = - replica_call(replicaset.master, func, args, opts) + replica_call(master, func, args, opts) -- Ignore net_status - master does not retry requests. return storage_status, retval, error_object end @@ -425,11 +469,6 @@ local function replicaset_template_multicallro(prefer_replica, balance) return r end end - local conn, err = replicaset_connect_master(replicaset) - if not conn then - return nil, err - end - return master end return function(replicaset, func, args, opts) @@ -444,9 +483,13 @@ local function replicaset_template_multicallro(prefer_replica, balance) end local end_time = fiber_clock() + timeout while not net_status and timeout > 0 do - replica, err = pick_next_replica(replicaset) + replica = pick_next_replica(replicaset) if not replica then - return nil, err + replica, timeout = replicaset_wait_master(replicaset, timeout) + if not replica then + return nil, timeout + end + replicaset_connect_to_replica(replicaset, replica) end opts.timeout = timeout net_status, storage_status, retval, err = @@ -508,7 +551,128 @@ local function rebind_replicasets(replicasets, old_replicasets) end end end + if old_replicaset then + -- Take a hint from the old replicaset who is the master now. + if replicaset.is_auto_master then + local master = old_replicaset.master + if master then + replicaset.master = replicaset.replicas[master.uuid] + end + end + -- Stop waiting for master in the old replicaset. Its running + -- requests won't find it anyway. Auto search works only for the + -- most actual replicaset objects. + if old_replicaset.is_auto_master then + old_replicaset.is_auto_master = false + old_replicaset.master_cond:broadcast() + end + end + end +end + +-- +-- Check if the master is still master, and find a new master if there is no a +-- known one. +-- +local function replicaset_locate_master(replicaset) + local is_done = true + local is_nop = true + if not replicaset.is_auto_master then + return is_done, is_nop + end + local func = 'vshard.storage._call' + local args = {'info'} + local const_timeout = consts.MASTER_SEARCH_TIEMOUT + local sync_opts = {timeout = const_timeout} + local ok, res, err, f + local master = replicaset.master + if master then + ok, res, err = replica_call(master, func, args, sync_opts) + if not ok then + return is_done, is_nop, err + end + if res.is_master then + return is_done, is_nop + end + log.info('Master of replicaset %s, node %s, has resigned. Trying '.. + 'to find a new one', replicaset.uuid, master.uuid) + replicaset.master = nil + end + is_nop = false + + local last_err + local futures = {} + local timeout = const_timeout + local deadline = fiber_clock() + timeout + local async_opts = {is_async = true} + local replicaset_uuid = replicaset.uuid + for replica_uuid, replica in pairs(replicaset.replicas) do + local conn = replica.conn + timeout, err = netbox_wait_connected(conn, timeout) + if not timeout then + last_err = err + timeout = deadline - fiber_clock() + else + ok, f = pcall(conn.call, conn, func, args, async_opts) + if not ok then + last_err = lerror.make(f) + else + futures[replica_uuid] = f + end + end + end + local master_uuid + for replica_uuid, f in pairs(futures) do + if timeout < 0 then + -- Netbox uses cond var inside, whose wait throws an error when gets + -- a negative timeout. Avoid that. + timeout = 0 + end + res, err = f:wait_result(timeout) + timeout = deadline - fiber_clock() + if not res then + f:discard() + last_err = err + goto next_wait + end + res = res[1] + if not res.is_master then + goto next_wait + end + if not master_uuid then + master_uuid = replica_uuid + goto next_wait + end + is_done = false + last_err = lerror.vshard(lerror.code.MULTIPLE_MASTERS_FOUND, + replicaset_uuid, master_uuid, replica_uuid) + do return is_done, is_nop, last_err end + ::next_wait:: + end + master = replicaset.replicas[master_uuid] + if master then + log.info('Found master for replicaset %s: %s', replicaset_uuid, + master_uuid) + replicaset.master = master + replicaset.master_cond:broadcast() + else + is_done = false + end + return is_done, is_nop, last_err +end + +local function locate_masters(replicasets) + local is_all_done = true + local is_all_nop = true + local last_err + for _, replicaset in pairs(replicasets) do + local is_done, is_nop, err = replicaset_locate_master(replicaset) + is_all_done = is_all_done and is_done + is_all_nop = is_all_nop and is_nop + last_err = err or last_err + fiber_yield() end + return is_all_done, is_all_nop, last_err end -- @@ -729,6 +893,8 @@ local function buildall(sharding_cfg) bucket_count = 0, lock = replicaset.lock, balance_i = 1, + is_auto_master = replicaset.master == 'auto', + master_cond = fiber.cond(), }, replicaset_mt) local priority_list = {} for replica_uuid, replica in pairs(replicaset.replicas) do @@ -802,4 +968,5 @@ return { wait_masters_connect = wait_masters_connect, rebind_replicasets = rebind_replicasets, outdate_replicasets = outdate_replicasets, + locate_masters = locate_masters, } diff --git a/vshard/router/init.lua b/vshard/router/init.lua index 5e2a96b..9407ccd 100644 --- a/vshard/router/init.lua +++ b/vshard/router/init.lua @@ -68,6 +68,8 @@ local ROUTER_TEMPLATE = { replicasets = nil, -- Fiber to maintain replica connections. failover_fiber = nil, + -- Fiber to watch for master changes and find new masters. + master_search_fiber = nil, -- Fiber to discovery buckets in background. discovery_fiber = nil, -- How discovery works. On - work infinitely. Off - no @@ -1030,6 +1032,93 @@ local function failover_f(router) end end +-------------------------------------------------------------------------------- +-- Master search +-------------------------------------------------------------------------------- + +local function master_search_step(router) + local ok, is_done, is_nop, err = pcall(lreplicaset.locate_masters, + router.replicasets) + if not ok then + err = is_done + is_done = false + is_nop = false + end + return is_done, is_nop, err +end + +-- +-- Master discovery background function. It is supposed to notice master changes +-- and find new masters in the replicasets, which are configured for that. +-- +-- XXX: due to polling the search might notice master change not right when it +-- happens. In future it makes sense to rewrite master search using +-- subscriptions. The problem is that at the moment of writing the subscriptions +-- are not working well in all Tarantool versions. +-- +local function master_search_f(router) + local module_version = M.module_version + local is_in_progress = false + while module_version == M.module_version do + local timeout + local start_time = fiber_clock() + local is_done, is_nop, err = master_search_step(router) + if err then + log.error('Error during master search: %s', lerror.make(err)) + end + if is_done then + timeout = consts.MASTER_SEARCH_IDLE_INTERVAL + elseif err then + timeout = consts.MASTER_SEARCH_BACKOFF_INTERVAL + else + timeout = consts.MASTER_SEARCH_WORK_INTERVAL + end + if not is_in_progress then + if not is_nop and is_done then + log.info('Master search happened') + elseif not is_done then + log.info('Master search is started') + is_in_progress = true + end + elseif is_done then + log.info('Master search is finished') + is_in_progress = false + end + local end_time = fiber_clock() + local duration = end_time - start_time + if not is_nop then + log.verbose('Master search step took %s seconds. Next in %s '.. + 'seconds', duration, timeout) + end + lfiber.sleep(timeout) + end +end + +local function master_search_set(router) + local enable = false + for _, rs in pairs(router.replicasets) do + if rs.is_auto_master then + enable = true + break + end + end + local search_fiber = router.master_search_fiber + if enable and search_fiber == nil then + log.info('Master auto search is enabled') + router.master_search_fiber = util.reloadable_fiber_create( + 'vshard.master_search.' .. router.name, M, 'master_search_f', + router) + elseif not enable and search_fiber ~= nil then + -- Do not make users pay for what they do not use - when the search is + -- disabled for all replicasets, there should not be any fiber. + log.info('Master auto search is disabled') + if search_fiber:status() ~= 'dead' then + search_fiber:cancel() + end + router.master_search_fiber = nil + end +end + -------------------------------------------------------------------------------- -- Configuration -------------------------------------------------------------------------------- @@ -1100,6 +1189,7 @@ local function router_cfg(router, cfg, is_reload) 'vshard.failover.' .. router.name, M, 'failover_f', router) end discovery_set(router, vshard_cfg.discovery_mode) + master_search_set(router) end -------------------------------------------------------------------------------- @@ -1535,6 +1625,10 @@ end -------------------------------------------------------------------------------- -- Module definition -------------------------------------------------------------------------------- +M.discovery_f = discovery_f +M.failover_f = failover_f +M.master_search_f = master_search_f +M.router_mt = router_mt -- -- About functions, saved in M, and reloading see comment in -- storage/init.lua. @@ -1556,10 +1650,6 @@ else M.module_version = M.module_version + 1 end -M.discovery_f = discovery_f -M.failover_f = failover_f -M.router_mt = router_mt - module.cfg = legacy_cfg module.new = router_new module.internal = M -- 2.24.3 (Apple Git-128)