From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 7BE016EC5B; Thu, 13 May 2021 14:08:09 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 7BE016EC5B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1620904089; bh=Y6xch7cjSVO0DuM6bxm9LckY+t59i0oHIwOJOr9yRNc=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=rMnXb/iTevfZ9cRUmCzFk51FpS+6IrmebHcy+pud1LAOK4mRp+jiDy7lE3f8ZFNWN ZYPDeOwXdRKxvX74+/7p5jxsJa1tGV0GgVMUH+kmas0Jb3J4qa+XTxfWBAIib79pn7 1hyctm6/be0dKzjfpStaJBdtqucoAd4hE4E7YcAo= Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 8057B6F3C7 for ; Thu, 13 May 2021 14:07:10 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 8057B6F3C7 Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1lh9BM-0006dS-UT; Thu, 13 May 2021 14:07:10 +0300 To: tarantool-patches@dev.tarantool.org, olegrok@tarantool.org, yaroslav.dynnikov@tarantool.org Date: Thu, 13 May 2021 13:07:06 +0200 Message-Id: <04d7d05f09a5ee7ed52b27c480e81232c406e415.1620903962.git.v.shpilevoy@tarantool.org> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD95978C26455E69BE0AAE50420B6EFF6A4E51E23FC86C8E287182A05F5380850409127AD287E0C5AE9F7E01F6632BF92C25CF91CB29861E605A2D7BBD4FDC4993F X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE74D0D2DEF2EB846B0EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637329F9579A0E72DCC8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D880EEFBE1DC24AE18BA269C5722AB4B10117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18CB629EEF1311BF91D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6300D3B61E77C8D3B089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A2AD77751E876CB595E8F7B195E1C978312FAD3DE69A73763D491AA492AB43C9A7 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C4C7A0BC55FA0FE5FC6FE9B5895C0B035203FBFA04813C373CAD936A03BFE2FA31B1881A6453793CE9C32612AADDFBE061C61BE10805914D3804EBA3D8E7E5B87ABF8C51168CD8EBDB5F0C88D684269EDEDC48ACC2A39D04F89CDFB48F4795C241BDAD6C7F3747799A X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34951738D4D62E58A4C907077A5EA1BC179AFDCB27AC35CCFF924B67DDD1BD9E3661114749501E9D091D7E09C32AA3244CE7F2945928D00EA555EF7F247E3D5309F165894D92D62706FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojoybArHp+PQV0DM3gHpkTtQ== X-Mailru-Sender: 689FA8AB762F73936BC43F508A0638228CEB4B41B37589A435C6E6D3040D41D83841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: [Tarantool-patches] [PATCH 2/2] vshard: fix buckets_count() on replicas X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" vshard.storage.buckets_count() uses a cached count value when no changes in _bucket space. The changes are detected using an on_replace trigger on _bucket space. But it wasn't installed on the replicas. As a result, on replica vshard.storage.buckets_count() didn't change at all after being called once. The patch makes replicas install the trigger on _bucket and update the bucket generation + drop the caches properly. The tricky part is that _bucket might not exist on a replica if it was vshard-configured earlier than the master. But for that occasion the trigger installation on _bucket is delayed until vshard schema is replicated from the master. An alternative was to introduce a special version of buckets_count() on replicas which does not use the cache at all. But the trigger is going to be useful in scope of #173 where it should drop bucket refs on the replica. Closes #276 Needed for #173 --- test/misc/reconfigure.result | 21 ++++--- test/misc/reconfigure.test.lua | 13 +++-- test/router/boot_replica_first.result | 9 ++- test/router/boot_replica_first.test.lua | 7 ++- test/storage/storage.result | 44 ++++++++++++++ test/storage/storage.test.lua | 19 ++++++ vshard/storage/init.lua | 78 ++++++++++++++++++++++--- 7 files changed, 164 insertions(+), 27 deletions(-) diff --git a/test/misc/reconfigure.result b/test/misc/reconfigure.result index 3b34841..a03ab5a 100644 --- a/test/misc/reconfigure.result +++ b/test/misc/reconfigure.result @@ -271,9 +271,9 @@ box.cfg.read_only --- - false ... -#box.space._bucket:on_replace() +assert(#box.space._bucket:on_replace() == 1) --- -- 1 +- true ... _ = test_run:switch('storage_2_a') --- @@ -305,10 +305,13 @@ box.cfg.read_only --- - true ... --- Should be zero on the slave node. Even though earlier the node was a master. -#box.space._bucket:on_replace() +-- +-- gh-276: replica should have triggers. This is important for proper update of +-- caches and in future for discarding refs in scope of gh-173. +-- +assert(#box.space._bucket:on_replace() == 1) --- -- 0 +- true ... _ = test_run:switch('storage_2_b') --- @@ -340,9 +343,9 @@ box.cfg.read_only --- - false ... -#box.space._bucket:on_replace() +assert(#box.space._bucket:on_replace() == 1) --- -- 1 +- true ... _ = test_run:switch('storage_3_a') --- @@ -373,9 +376,9 @@ box.cfg.read_only --- - false ... -#box.space._bucket:on_replace() +assert(#box.space._bucket:on_replace() == 1) --- -- 1 +- true ... _ = test_run:switch('router_1') --- diff --git a/test/misc/reconfigure.test.lua b/test/misc/reconfigure.test.lua index 348628c..61ea3c0 100644 --- a/test/misc/reconfigure.test.lua +++ b/test/misc/reconfigure.test.lua @@ -109,7 +109,7 @@ table.sort(uris) uris box.cfg.replication box.cfg.read_only -#box.space._bucket:on_replace() +assert(#box.space._bucket:on_replace() == 1) _ = test_run:switch('storage_2_a') info = vshard.storage.info() @@ -119,8 +119,11 @@ table.sort(uris) uris box.cfg.replication box.cfg.read_only --- Should be zero on the slave node. Even though earlier the node was a master. -#box.space._bucket:on_replace() +-- +-- gh-276: replica should have triggers. This is important for proper update of +-- caches and in future for discarding refs in scope of gh-173. +-- +assert(#box.space._bucket:on_replace() == 1) _ = test_run:switch('storage_2_b') info = vshard.storage.info() @@ -130,7 +133,7 @@ table.sort(uris) uris box.cfg.replication box.cfg.read_only -#box.space._bucket:on_replace() +assert(#box.space._bucket:on_replace() == 1) _ = test_run:switch('storage_3_a') info = vshard.storage.info() @@ -140,7 +143,7 @@ table.sort(uris) uris box.cfg.replication box.cfg.read_only -#box.space._bucket:on_replace() +assert(#box.space._bucket:on_replace() == 1) _ = test_run:switch('router_1') info = vshard.router.info() diff --git a/test/router/boot_replica_first.result b/test/router/boot_replica_first.result index 1705230..3c5a08c 100644 --- a/test/router/boot_replica_first.result +++ b/test/router/boot_replica_first.result @@ -76,10 +76,13 @@ vshard.storage.call(1, 'read', 'echo', {100}) | message: 'Cannot perform action with bucket 1, reason: Not found' | name: WRONG_BUCKET | ... --- Should not have triggers. -#box.space._bucket:on_replace() +-- +-- gh-276: should have triggers. This is important for proper update of caches +-- and in future for discarding refs in scope of gh-173. +-- +assert(#box.space._bucket:on_replace() == 1) | --- - | - 0 + | - true | ... test_run:switch('router') diff --git a/test/router/boot_replica_first.test.lua b/test/router/boot_replica_first.test.lua index 7b1b3fd..f973f2b 100644 --- a/test/router/boot_replica_first.test.lua +++ b/test/router/boot_replica_first.test.lua @@ -29,8 +29,11 @@ test_run:switch('box_1_b') test_run:wait_lsn('box_1_b', 'box_1_a') -- Fails, but gracefully. Because the bucket is not found here. vshard.storage.call(1, 'read', 'echo', {100}) --- Should not have triggers. -#box.space._bucket:on_replace() +-- +-- gh-276: should have triggers. This is important for proper update of caches +-- and in future for discarding refs in scope of gh-173. +-- +assert(#box.space._bucket:on_replace() == 1) test_run:switch('router') vshard.router.bootstrap() diff --git a/test/storage/storage.result b/test/storage/storage.result index d18b7f8..570d9c6 100644 --- a/test/storage/storage.result +++ b/test/storage/storage.result @@ -708,6 +708,24 @@ assert(vshard.storage.buckets_count() == 0) --- - true ... +test_run:wait_lsn('storage_1_b', 'storage_1_a') +--- +... +_ = test_run:switch('storage_1_b') +--- +... +-- +-- gh-276: bucket count cache should be properly updated on the replica nodes. +-- For that the replicas must also install on_replace trigger on _bucket space +-- to watch for changes. +-- +assert(vshard.storage.buckets_count() == 0) +--- +- true +... +_ = test_run:switch('storage_1_a') +--- +... vshard.storage.bucket_force_create(1, 5) --- - true @@ -716,6 +734,19 @@ assert(vshard.storage.buckets_count() == 5) --- - true ... +test_run:wait_lsn('storage_1_b', 'storage_1_a') +--- +... +_ = test_run:switch('storage_1_b') +--- +... +assert(vshard.storage.buckets_count() == 5) +--- +- true +... +_ = test_run:switch('storage_1_a') +--- +... vshard.storage.bucket_force_create(6, 5) --- - true @@ -724,9 +755,22 @@ assert(vshard.storage.buckets_count() == 10) --- - true ... +test_run:wait_lsn('storage_1_b', 'storage_1_a') +--- +... +_ = test_run:switch('storage_1_b') +--- +... +assert(vshard.storage.buckets_count() == 10) +--- +- true +... -- -- Bucket_generation_wait() registry function. -- +_ = test_run:switch('storage_1_a') +--- +... lstorage = require('vshard.registry').storage --- ... diff --git a/test/storage/storage.test.lua b/test/storage/storage.test.lua index 97558f6..494e2e8 100644 --- a/test/storage/storage.test.lua +++ b/test/storage/storage.test.lua @@ -201,14 +201,33 @@ for bid, _ in pairs(buckets) do vshard.storage.bucket_force_drop(bid) end _ = test_run:switch('storage_1_a') assert(vshard.storage.buckets_count() == 0) +test_run:wait_lsn('storage_1_b', 'storage_1_a') +_ = test_run:switch('storage_1_b') +-- +-- gh-276: bucket count cache should be properly updated on the replica nodes. +-- For that the replicas must also install on_replace trigger on _bucket space +-- to watch for changes. +-- +assert(vshard.storage.buckets_count() == 0) + +_ = test_run:switch('storage_1_a') vshard.storage.bucket_force_create(1, 5) assert(vshard.storage.buckets_count() == 5) +test_run:wait_lsn('storage_1_b', 'storage_1_a') +_ = test_run:switch('storage_1_b') +assert(vshard.storage.buckets_count() == 5) + +_ = test_run:switch('storage_1_a') vshard.storage.bucket_force_create(6, 5) assert(vshard.storage.buckets_count() == 10) +test_run:wait_lsn('storage_1_b', 'storage_1_a') +_ = test_run:switch('storage_1_b') +assert(vshard.storage.buckets_count() == 10) -- -- Bucket_generation_wait() registry function. -- +_ = test_run:switch('storage_1_a') lstorage = require('vshard.registry').storage ok, err = lstorage.bucket_generation_wait(-1) assert(not ok and err.message) diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua index c4400f7..b1a20d6 100644 --- a/vshard/storage/init.lua +++ b/vshard/storage/init.lua @@ -115,6 +115,14 @@ if not M then -- replace the old function is to keep its reference. -- bucket_on_replace = nil, + -- + -- Reference to the function used as on_replace trigger on + -- _schema space. Saved explicitly by the same reason as + -- _bucket on_replace. + -- It is used by replicas to wait for schema bootstrap + -- because they might be configured earlier than the + -- master. + schema_on_replace = nil, -- Fast alternative to box.space._bucket:count(). But may be nil. Reset -- on each generation change. bucket_count_cache = nil, @@ -398,6 +406,62 @@ local function schema_version_make(ver) return setmetatable(ver, schema_version_mt) end +local function schema_install_triggers() + local _bucket = box.space._bucket + if M.bucket_on_replace then + local ok, err = pcall(_bucket.on_replace, _bucket, nil, + M.bucket_on_replace) + if not ok then + log.warn('Could not drop old trigger from '.. + '_bucket: %s', err) + end + end + _bucket:on_replace(bucket_generation_increment) + M.bucket_on_replace = bucket_generation_increment +end + +local function schema_install_on_replace(_, new) + -- Wait not just for _bucket to appear, but for the entire + -- schema. This might be important if the schema will ever + -- consist of more than just _bucket. + if new[1] ~= 'vshard_version' then + return + end + schema_install_triggers() + + local _schema = box.space._schema + local ok, err = pcall(_schema.on_replace, _schema, nil, M.schema_on_replace) + if not ok then + log.warn('Could not drop trigger from _schema inside of the '.. + 'trigger: %s', err) + end + M.schema_on_replace = nil + -- Drop the caches which might have been created while the + -- schema was being replicated. + bucket_generation_increment() +end + +-- +-- Install the triggers later when there is an actual schema to install them on. +-- On replicas it might happen that they are vshard-configured earlier than the +-- master and therefore don't have the schema right away. +-- +local function schema_install_triggers_delayed() + log.info('Could not find _bucket space to install triggers - delayed '.. + 'until the schema is replicated') + assert(not box.space._bucket) + local _schema = box.space._schema + if M.schema_on_replace then + local ok, err = pcall(_schema.on_replace, _schema, nil, + M.schema_on_replace) + if not ok then + log.warn('Could not drop trigger from _schema: %s', err) + end + end + _schema:on_replace(schema_install_on_replace) + M.schema_on_replace = schema_install_on_replace +end + -- VShard versioning works in 4 numbers: major, minor, patch, and -- a last helper number incremented on every schema change, if -- first 3 numbers stay not changed. That happens when users take @@ -2612,17 +2676,15 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload) local uri = luri.parse(this_replica.uri) schema_upgrade(is_master, uri.login, uri.password) - if M.bucket_on_replace then - box.space._bucket:on_replace(nil, M.bucket_on_replace) - M.bucket_on_replace = nil - end - lref.cfg() - if is_master then - box.space._bucket:on_replace(bucket_generation_increment) - M.bucket_on_replace = bucket_generation_increment + if is_master or box.space._bucket then + schema_install_triggers() + else + schema_install_triggers_delayed() end + lref.cfg() lsched.cfg(vshard_cfg) + lreplicaset.rebind_replicasets(new_replicasets, M.replicasets) lreplicaset.outdate_replicasets(M.replicasets) M.replicasets = new_replicasets -- 2.24.3 (Apple Git-128)