From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 508F670361; Wed, 10 Feb 2021 12:01:13 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 508F670361 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1612947673; bh=DjRIDA9LgRKlRIDo0YnvnF2FQCxTB0jjoh4GSbpEW5U=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=ZQw8apv3Nbs86E/lkheCzL2MgH/PYdcI38/6d5hcls0nDpue0UXy4nlEG2kVWoQuR R6diUKgIphe5lbx+YeXdnqJqb/kGqAPFPo/lmpOCRPVQEHdIsLyl+fbNSJzVtzENp3 CJn0ZvpvzZuEqgR1eN1LMVL+2u+tQJnuNPzzJAoU= Received: from smtp36.i.mail.ru (smtp36.i.mail.ru [94.100.177.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 545BC6C1BE for ; Wed, 10 Feb 2021 12:00:37 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 545BC6C1BE Received: by smtp36.i.mail.ru with esmtpa (envelope-from ) id 1l9lMO-0001XY-Fk; Wed, 10 Feb 2021 12:00:33 +0300 To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, yaroslav.dynnikov@tarantool.org References: Message-ID: <2deb1054-3bc0-2f97-aeb2-db9577fdd84a@tarantool.org> Date: Wed, 10 Feb 2021 12:00:31 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-GB X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD953AC099BC0052A9C6D5758EA387698D8F2F71171D5C2C2E5182A05F538085040F29F04B3462F92255BD75D23E6249AE9FE456FF63ACAC6764F6D3368292A2EC3 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE73C871DD2182510D5EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006372094AD700861FA748638F802B75D45FF5571747095F342E8C7A0BC55FA0FE5FC8767A9B625D33EB404A80827389B3FF26DD74542AE05EA7E389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C0A3E989B1926288338941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B601F8F2FECC0250C8CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB8D32BA5DBAC0009BE9E8FC8737B5C2249B1F0E46EA745084D76E601842F6C81A12EF20D2F80756B5F7E9C4E3C761E06A776E601842F6C81A127C277FBC8AE2E8B6A41CCFFBAEDDFE13AA81AA40904B5D9DBF02ECDB25306B2B25CBF701D1BE8734AD6D5ED66289B5278DA827A17800CE7AD122C37AACC1F0267F23339F89546C5A8DF7F3B2552694A6FED454B719173D6725E5C173C3A84C3EC873B826A97446335872C767BF85DA2F004C906525384306FED454B719173D6462275124DF8B9C9DE2850DD75B2526BE5BFE6E7EFDEDCD789D4C264860C145E X-C1DE0DAB: 0D63561A33F958A5633BED58D2D5C9F4940CA7DAFCA65F0B4824E6BB1378F2BFD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75F04B387B5D7535DE410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D342B8615F5CFAD9D02FBA74EA1D71B46A4565B7387B2D4BD74E432810BF1B6EFC298C23C1EFBE1CD961D7E09C32AA3244C00C66802EB2D337468F88A5CF6D8B581259227199D06760AFACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojmMmQ+JvDeDEFFtEHH9lJ/g== X-Mailru-Sender: E02534FE7B43636DC6DBEBE9641397948199367F95F1E47EA6A14F1A4CC9EF80D42F93969D0DF7B723E75C7104EB1B885DEE61814008E47C7013064206BFB89F93956FB04BA385BE9437F6177E88F7363CDA0F3B3F5B9367 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 8/9] recovery: introduce reactive recovery X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Oleg Babin via Tarantool-patches Reply-To: Oleg Babin Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Thanks for your patch. LGTM. On 10/02/2021 02:46, Vladislav Shpilevoy wrote: > Recovery is a fiber on a master node which tries to resolve > SENDING/RECEIVING buckets into GARBAGE or ACTIVE, in case they are > stuck. Usually it happens due to a conflict on the receiving side, > or if a restart happens during bucket send. > > Recovery was proactive. It used to wakeup with a constant period > to find and resolve the needed buckets. > > But this won't work with the future feature called 'map-reduce'. > Map-reduce as a preparation stage will need to ensure that all > buckets on a storage are readable and writable. With the current > recovery algorithm if a bucket is broken, it won't be recovered > for the next 5 seconds by default. During this time all new > map-reduce requests can't execute. > > This is not acceptable. As well as too frequent wakeup of recovery > fiber because it would waste TX thread time. > > The patch makes recovery fiber wakeup not by a timeout but by > events happening with _bucket space. Recovery fiber sleeps on a > condition variable which is signaled when _bucket is changed. > > This is very similar to the reactive GC feature in a previous > commit. > > It is worth mentioning that the backoff happens not only when a > bucket couldn't be recovered (its transfer is still in progress, > for example), but also when a network error happened and recovery > couldn't check state of the bucket on the other storage. > > It would be a useless busy loop to retry network errors > immediately after their appearance. Recovery uses a backoff > interval for them as well. > > Needed for #147 > --- > test/router/router.result | 22 ++++++++--- > test/router/router.test.lua | 13 ++++++- > test/storage/recovery.result | 8 ++++ > test/storage/recovery.test.lua | 5 +++ > test/storage/recovery_errinj.result | 16 +++++++- > test/storage/recovery_errinj.test.lua | 9 ++++- > vshard/consts.lua | 2 +- > vshard/storage/init.lua | 54 +++++++++++++++++++++++---- > 8 files changed, 110 insertions(+), 19 deletions(-) > > diff --git a/test/router/router.result b/test/router/router.result > index b2efd6d..3c1d073 100644 > --- a/test/router/router.result > +++ b/test/router/router.result > @@ -312,6 +312,11 @@ replicaset, err = vshard.router.bucket_discovery(2); return err == nil or err > _ = test_run:switch('storage_2_a') > --- > ... > +-- Pause recovery. It is too aggressive, and the test needs to see buckets in > +-- their intermediate states. > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > +--- > +... > box.space._bucket:replace({1, vshard.consts.BUCKET.SENDING, util.replicasets[1]}) > --- > - [1, 'sending', ''] > @@ -319,6 +324,9 @@ box.space._bucket:replace({1, vshard.consts.BUCKET.SENDING, util.replicasets[1]} > _ = test_run:switch('storage_1_a') > --- > ... > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > +--- > +... > box.space._bucket:replace({1, vshard.consts.BUCKET.RECEIVING, util.replicasets[2]}) > --- > - [1, 'receiving', ''] > @@ -342,19 +350,21 @@ util.check_error(vshard.router.call, 1, 'write', 'echo', {123}) > name: TRANSFER_IS_IN_PROGRESS > message: Bucket 1 is transferring to replicaset > ... > -_ = test_run:switch('storage_2_a') > +_ = test_run:switch('storage_1_a') > +--- > +... > +box.space._bucket:delete({1}) > --- > +- [1, 'receiving', ''] > ... > -box.space._bucket:replace({1, vshard.consts.BUCKET.ACTIVE}) > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false > --- > -- [1, 'active'] > ... > -_ = test_run:switch('storage_1_a') > +_ = test_run:switch('storage_2_a') > --- > ... > -box.space._bucket:delete({1}) > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false > --- > -- [1, 'receiving', ''] > ... > _ = test_run:switch('router_1') > --- > diff --git a/test/router/router.test.lua b/test/router/router.test.lua > index 154310b..aa3eb3b 100644 > --- a/test/router/router.test.lua > +++ b/test/router/router.test.lua > @@ -114,19 +114,28 @@ replicaset, err = vshard.router.bucket_discovery(1); return err == nil or err > replicaset, err = vshard.router.bucket_discovery(2); return err == nil or err > > _ = test_run:switch('storage_2_a') > +-- Pause recovery. It is too aggressive, and the test needs to see buckets in > +-- their intermediate states. > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > box.space._bucket:replace({1, vshard.consts.BUCKET.SENDING, util.replicasets[1]}) > + > _ = test_run:switch('storage_1_a') > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > box.space._bucket:replace({1, vshard.consts.BUCKET.RECEIVING, util.replicasets[2]}) > + > _ = test_run:switch('router_1') > -- Ok to read sending bucket. > vshard.router.call(1, 'read', 'echo', {123}) > -- Not ok to write sending bucket. > util.check_error(vshard.router.call, 1, 'write', 'echo', {123}) > > -_ = test_run:switch('storage_2_a') > -box.space._bucket:replace({1, vshard.consts.BUCKET.ACTIVE}) > _ = test_run:switch('storage_1_a') > box.space._bucket:delete({1}) > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false > + > +_ = test_run:switch('storage_2_a') > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false > + > _ = test_run:switch('router_1') > > -- Check unavailability of master of a replicaset. > diff --git a/test/storage/recovery.result b/test/storage/recovery.result > index 8ccb0b9..fa92bca 100644 > --- a/test/storage/recovery.result > +++ b/test/storage/recovery.result > @@ -28,12 +28,20 @@ util.push_rs_filters(test_run) > _ = test_run:switch("storage_2_a") > --- > ... > +-- Pause until restart. Otherwise recovery does its job too fast and does not > +-- allow to simulate the intermediate state. > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > +--- > +... > vshard.storage.rebalancer_disable() > --- > ... > _ = test_run:switch("storage_1_a") > --- > ... > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > +--- > +... > -- Create buckets sending to rs2 and restart - recovery must > -- garbage some of them and activate others. Receiving buckets > -- must be garbaged on bootstrap. > diff --git a/test/storage/recovery.test.lua b/test/storage/recovery.test.lua > index a0651e8..93cec68 100644 > --- a/test/storage/recovery.test.lua > +++ b/test/storage/recovery.test.lua > @@ -10,8 +10,13 @@ util.wait_master(test_run, REPLICASET_2, 'storage_2_a') > util.push_rs_filters(test_run) > > _ = test_run:switch("storage_2_a") > +-- Pause until restart. Otherwise recovery does its job too fast and does not > +-- allow to simulate the intermediate state. > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > vshard.storage.rebalancer_disable() > + > _ = test_run:switch("storage_1_a") > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > > -- Create buckets sending to rs2 and restart - recovery must > -- garbage some of them and activate others. Receiving buckets > diff --git a/test/storage/recovery_errinj.result b/test/storage/recovery_errinj.result > index 3e9a9bf..8c178d5 100644 > --- a/test/storage/recovery_errinj.result > +++ b/test/storage/recovery_errinj.result > @@ -35,9 +35,17 @@ _ = test_run:switch('storage_2_a') > vshard.storage.internal.errinj.ERRINJ_LAST_RECEIVE_DELAY = true > --- > ... > +-- Pause recovery. Otherwise it does its job too fast and does not allow to > +-- simulate the intermediate state. > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > +--- > +... > _ = test_run:switch('storage_1_a') > --- > ... > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > +--- > +... > _bucket = box.space._bucket > --- > ... > @@ -76,10 +84,16 @@ _bucket:get{1} > --- > - [1, 'active'] > ... > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false > +--- > +... > _ = test_run:switch('storage_1_a') > --- > ... > -while _bucket:count() ~= 0 do vshard.storage.recovery_wakeup() fiber.sleep(0.1) end > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false > +--- > +... > +wait_bucket_is_collected(1) > --- > ... > _ = test_run:switch("default") > diff --git a/test/storage/recovery_errinj.test.lua b/test/storage/recovery_errinj.test.lua > index 8c1a9d2..c730560 100644 > --- a/test/storage/recovery_errinj.test.lua > +++ b/test/storage/recovery_errinj.test.lua > @@ -14,7 +14,12 @@ util.push_rs_filters(test_run) > -- > _ = test_run:switch('storage_2_a') > vshard.storage.internal.errinj.ERRINJ_LAST_RECEIVE_DELAY = true > +-- Pause recovery. Otherwise it does its job too fast and does not allow to > +-- simulate the intermediate state. > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > + > _ = test_run:switch('storage_1_a') > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true > _bucket = box.space._bucket > _bucket:replace{1, vshard.consts.BUCKET.ACTIVE, util.replicasets[2]} > ret, err = vshard.storage.bucket_send(1, util.replicasets[2], {timeout = 0.1}) > @@ -27,9 +32,11 @@ vshard.storage.internal.errinj.ERRINJ_LAST_RECEIVE_DELAY = false > _bucket = box.space._bucket > while _bucket:get{1}.status ~= vshard.consts.BUCKET.ACTIVE do fiber.sleep(0.01) end > _bucket:get{1} > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false > > _ = test_run:switch('storage_1_a') > -while _bucket:count() ~= 0 do vshard.storage.recovery_wakeup() fiber.sleep(0.1) end > +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false > +wait_bucket_is_collected(1) > > _ = test_run:switch("default") > test_run:drop_cluster(REPLICASET_2) > diff --git a/vshard/consts.lua b/vshard/consts.lua > index 3f1585a..cf3f422 100644 > --- a/vshard/consts.lua > +++ b/vshard/consts.lua > @@ -39,7 +39,7 @@ return { > DEFAULT_SYNC_TIMEOUT = 1; > RECONNECT_TIMEOUT = 0.5; > GC_BACKOFF_INTERVAL = 5, > - RECOVERY_INTERVAL = 5; > + RECOVERY_BACKOFF_INTERVAL = 5, > COLLECT_LUA_GARBAGE_INTERVAL = 100; > > DISCOVERY_IDLE_INTERVAL = 10, > diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua > index 31a6fc7..85f5024 100644 > --- a/vshard/storage/init.lua > +++ b/vshard/storage/init.lua > @@ -634,13 +634,16 @@ end > -- Infinite function to resolve status of buckets, whose 'sending' > -- has failed due to tarantool or network problems. Restarts on > -- reload. > --- @param module_version Module version, on which the current > --- function had been started. If the actual module version > --- appears to be changed, then stop recovery. It is > --- restarted in reloadable_fiber. > -- > local function recovery_f() > local module_version = M.module_version > + -- Changes of _bucket increments bucket generation. Recovery has its own > + -- bucket generation which is <= actual. Recovery is finished, when its > + -- generation == bucket generation. In such a case the fiber does nothing > + -- until next _bucket change. > + local bucket_generation_recovered = -1 > + local bucket_generation_current = M.bucket_generation > + local ok, sleep_time, is_all_recovered, total, recovered > -- Interrupt recovery if a module has been reloaded. Perhaps, > -- there was found a bug, and reload fixes it. > while module_version == M.module_version do > @@ -648,22 +651,57 @@ local function recovery_f() > lfiber.yield() > goto continue > end > - local ok, total, recovered = pcall(recovery_step_by_type, > - consts.BUCKET.SENDING) > + is_all_recovered = true > + if bucket_generation_recovered == bucket_generation_current then > + goto sleep > + end > + > + ok, total, recovered = pcall(recovery_step_by_type, > + consts.BUCKET.SENDING) > if not ok then > + is_all_recovered = false > log.error('Error during sending buckets recovery: %s', total) > + elseif total ~= recovered then > + is_all_recovered = false > end > + > ok, total, recovered = pcall(recovery_step_by_type, > consts.BUCKET.RECEIVING) > if not ok then > + is_all_recovered = false > log.error('Error during receiving buckets recovery: %s', total) > elseif total == 0 then > bucket_receiving_quota_reset() > else > bucket_receiving_quota_add(recovered) > + if total ~= recovered then > + is_all_recovered = false > + end > + end > + > + ::sleep:: > + if not is_all_recovered then > + bucket_generation_recovered = -1 > + else > + bucket_generation_recovered = bucket_generation_current > + end > + bucket_generation_current = M.bucket_generation > + > + if not is_all_recovered then > + -- One option - some buckets are not broken. Their transmission is > + -- still in progress. Don't need to retry immediately. Another > + -- option - network errors when tried to repair the buckets. Also no > + -- need to retry often. It won't help. > + sleep_time = consts.RECOVERY_BACKOFF_INTERVAL > + elseif bucket_generation_recovered ~= bucket_generation_current then > + sleep_time = 0 > + else > + sleep_time = consts.TIMEOUT_INFINITY > + end > + if module_version == M.module_version then > + M.bucket_generation_cond:wait(sleep_time) > end > - lfiber.sleep(consts.RECOVERY_INTERVAL) > - ::continue:: > + ::continue:: > end > end >