Hi! I've got several questions about this patch, find them below. On Thu, 13 May 2021 at 14:07, Vladislav Shpilevoy wrote: > vshard.storage.buckets_count() uses a cached count value when no > changes in _bucket space. The changes are detected using an > on_replace trigger on _bucket space. But it wasn't installed on > the replicas. > > As a result, on replica vshard.storage.buckets_count() didn't > change at all after being called once. > > The patch makes replicas install the trigger on _bucket and update > the bucket generation + drop the caches properly. > > The tricky part is that _bucket might not exist on a replica if it > was vshard-configured earlier than the master. But for that > occasion the trigger installation on _bucket is delayed until > vshard schema is replicated from the master. > > An alternative was to introduce a special version of > buckets_count() on replicas which does not use the cache at all. > But the trigger is going to be useful in scope of #173 where it > should drop bucket refs on the replica. > > Closes #276 > Needed for #173 > --- > test/misc/reconfigure.result | 21 ++++--- > test/misc/reconfigure.test.lua | 13 +++-- > test/router/boot_replica_first.result | 9 ++- > test/router/boot_replica_first.test.lua | 7 ++- > test/storage/storage.result | 44 ++++++++++++++ > test/storage/storage.test.lua | 19 ++++++ > vshard/storage/init.lua | 78 ++++++++++++++++++++++--- > 7 files changed, 164 insertions(+), 27 deletions(-) > > diff --git a/test/misc/reconfigure.result b/test/misc/reconfigure.result > index 3b34841..a03ab5a 100644 > --- a/test/misc/reconfigure.result > +++ b/test/misc/reconfigure.result > @@ -271,9 +271,9 @@ box.cfg.read_only > --- > - false > ... > -#box.space._bucket:on_replace() > +assert(#box.space._bucket:on_replace() == 1) > --- > -- 1 > +- true > ... > Why is this change necessary? Seems now you miss an actual value in case the test fails. This question applies to many cases below. No matter what testing framework is used, I think that assert_equals(value, 1) -- assertion failed: expected 1, got 2 is better (more helpful) than eager assert(value == 1) -- assertion failed! > _ = test_run:switch('storage_2_a') > --- > @@ -305,10 +305,13 @@ box.cfg.read_only > --- > - true > ... > --- Should be zero on the slave node. Even though earlier the node was a > master. > -#box.space._bucket:on_replace() > +-- > +-- gh-276: replica should have triggers. This is important for proper > update of > +-- caches and in future for discarding refs in scope of gh-173. > +-- > +assert(#box.space._bucket:on_replace() == 1) > --- > -- 0 > +- true > ... _ = test_run:switch('storage_2_b') > --- > @@ -340,9 +343,9 @@ box.cfg.read_only > --- > - false > ... > -#box.space._bucket:on_replace() > +assert(#box.space._bucket:on_replace() == 1) > --- > -- 1 > +- true > ... > _ = test_run:switch('storage_3_a') > --- > @@ -373,9 +376,9 @@ box.cfg.read_only > --- > - false > ... > -#box.space._bucket:on_replace() > +assert(#box.space._bucket:on_replace() == 1) > --- > -- 1 > +- true > ... > _ = test_run:switch('router_1') > --- > diff --git a/test/misc/reconfigure.test.lua > b/test/misc/reconfigure.test.lua > index 348628c..61ea3c0 100644 > --- a/test/misc/reconfigure.test.lua > +++ b/test/misc/reconfigure.test.lua > @@ -109,7 +109,7 @@ table.sort(uris) > uris > box.cfg.replication > box.cfg.read_only > -#box.space._bucket:on_replace() > +assert(#box.space._bucket:on_replace() == 1) > > _ = test_run:switch('storage_2_a') > info = vshard.storage.info() > @@ -119,8 +119,11 @@ table.sort(uris) > uris > box.cfg.replication > box.cfg.read_only > --- Should be zero on the slave node. Even though earlier the node was a > master. > -#box.space._bucket:on_replace() > +-- > +-- gh-276: replica should have triggers. This is important for proper > update of > +-- caches and in future for discarding refs in scope of gh-173. > +-- > +assert(#box.space._bucket:on_replace() == 1) > > _ = test_run:switch('storage_2_b') > info = vshard.storage.info() > @@ -130,7 +133,7 @@ table.sort(uris) > uris > box.cfg.replication > box.cfg.read_only > -#box.space._bucket:on_replace() > +assert(#box.space._bucket:on_replace() == 1) > > _ = test_run:switch('storage_3_a') > info = vshard.storage.info() > @@ -140,7 +143,7 @@ table.sort(uris) > uris > box.cfg.replication > box.cfg.read_only > -#box.space._bucket:on_replace() > +assert(#box.space._bucket:on_replace() == 1) > > _ = test_run:switch('router_1') > info = vshard.router.info() > diff --git a/test/router/boot_replica_first.result > b/test/router/boot_replica_first.result > index 1705230..3c5a08c 100644 > --- a/test/router/boot_replica_first.result > +++ b/test/router/boot_replica_first.result > @@ -76,10 +76,13 @@ vshard.storage.call(1, 'read', 'echo', {100}) > | message: 'Cannot perform action with bucket 1, reason: Not found' > | name: WRONG_BUCKET > | ... > --- Should not have triggers. > -#box.space._bucket:on_replace() > +-- > +-- gh-276: should have triggers. This is important for proper update of > caches > +-- and in future for discarding refs in scope of gh-173. > +-- > +assert(#box.space._bucket:on_replace() == 1) > | --- > - | - 0 > + | - true > | ... > > test_run:switch('router') > diff --git a/test/router/boot_replica_first.test.lua > b/test/router/boot_replica_first.test.lua > index 7b1b3fd..f973f2b 100644 > --- a/test/router/boot_replica_first.test.lua > +++ b/test/router/boot_replica_first.test.lua > @@ -29,8 +29,11 @@ test_run:switch('box_1_b') > test_run:wait_lsn('box_1_b', 'box_1_a') > -- Fails, but gracefully. Because the bucket is not found here. > vshard.storage.call(1, 'read', 'echo', {100}) > --- Should not have triggers. > -#box.space._bucket:on_replace() > +-- > +-- gh-276: should have triggers. This is important for proper update of > caches > +-- and in future for discarding refs in scope of gh-173. > +-- > +assert(#box.space._bucket:on_replace() == 1) > > test_run:switch('router') > vshard.router.bootstrap() > diff --git a/test/storage/storage.result b/test/storage/storage.result > index d18b7f8..570d9c6 100644 > --- a/test/storage/storage.result > +++ b/test/storage/storage.result > @@ -708,6 +708,24 @@ assert(vshard.storage.buckets_count() == 0) > --- > - true > ... > +test_run:wait_lsn('storage_1_b', 'storage_1_a') > +--- > +... > +_ = test_run:switch('storage_1_b') > +--- > +... > +-- > +-- gh-276: bucket count cache should be properly updated on the replica > nodes. > +-- For that the replicas must also install on_replace trigger on _bucket > space > +-- to watch for changes. > +-- > +assert(vshard.storage.buckets_count() == 0) > +--- > +- true > +... > +_ = test_run:switch('storage_1_a') > +--- > +... > vshard.storage.bucket_force_create(1, 5) > --- > - true > @@ -716,6 +734,19 @@ assert(vshard.storage.buckets_count() == 5) > --- > - true > ... > +test_run:wait_lsn('storage_1_b', 'storage_1_a') > +--- > +... > +_ = test_run:switch('storage_1_b') > +--- > +... > +assert(vshard.storage.buckets_count() == 5) > +--- > +- true > +... > +_ = test_run:switch('storage_1_a') > +--- > +... > vshard.storage.bucket_force_create(6, 5) > --- > - true > @@ -724,9 +755,22 @@ assert(vshard.storage.buckets_count() == 10) > --- > - true > ... > +test_run:wait_lsn('storage_1_b', 'storage_1_a') > +--- > +... > +_ = test_run:switch('storage_1_b') > +--- > +... > +assert(vshard.storage.buckets_count() == 10) > +--- > +- true > +... > -- > -- Bucket_generation_wait() registry function. > -- > +_ = test_run:switch('storage_1_a') > +--- > +... > lstorage = require('vshard.registry').storage > --- > ... > diff --git a/test/storage/storage.test.lua b/test/storage/storage.test.lua > index 97558f6..494e2e8 100644 > --- a/test/storage/storage.test.lua > +++ b/test/storage/storage.test.lua > @@ -201,14 +201,33 @@ for bid, _ in pairs(buckets) do > vshard.storage.bucket_force_drop(bid) end > > _ = test_run:switch('storage_1_a') > assert(vshard.storage.buckets_count() == 0) > +test_run:wait_lsn('storage_1_b', 'storage_1_a') > +_ = test_run:switch('storage_1_b') > +-- > +-- gh-276: bucket count cache should be properly updated on the replica > nodes. > +-- For that the replicas must also install on_replace trigger on _bucket > space > +-- to watch for changes. > +-- > +assert(vshard.storage.buckets_count() == 0) > + > +_ = test_run:switch('storage_1_a') > vshard.storage.bucket_force_create(1, 5) > assert(vshard.storage.buckets_count() == 5) > +test_run:wait_lsn('storage_1_b', 'storage_1_a') > +_ = test_run:switch('storage_1_b') > +assert(vshard.storage.buckets_count() == 5) > + > +_ = test_run:switch('storage_1_a') > vshard.storage.bucket_force_create(6, 5) > assert(vshard.storage.buckets_count() == 10) > +test_run:wait_lsn('storage_1_b', 'storage_1_a') > +_ = test_run:switch('storage_1_b') > +assert(vshard.storage.buckets_count() == 10) > > -- > -- Bucket_generation_wait() registry function. > -- > +_ = test_run:switch('storage_1_a') > lstorage = require('vshard.registry').storage > ok, err = lstorage.bucket_generation_wait(-1) > assert(not ok and err.message) > diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua > index c4400f7..b1a20d6 100644 > --- a/vshard/storage/init.lua > +++ b/vshard/storage/init.lua > @@ -115,6 +115,14 @@ if not M then > -- replace the old function is to keep its reference. > -- > bucket_on_replace = nil, > + -- > + -- Reference to the function used as on_replace trigger on > + -- _schema space. Saved explicitly by the same reason as > + -- _bucket on_replace. > + -- It is used by replicas to wait for schema bootstrap > + -- because they might be configured earlier than the > + -- master. > + schema_on_replace = nil, > -- Fast alternative to box.space._bucket:count(). But may be nil. > Reset > -- on each generation change. > bucket_count_cache = nil, > @@ -398,6 +406,62 @@ local function schema_version_make(ver) > return setmetatable(ver, schema_version_mt) > end > > +local function schema_install_triggers() > + local _bucket = box.space._bucket > + if M.bucket_on_replace then > + local ok, err = pcall(_bucket.on_replace, _bucket, nil, > + M.bucket_on_replace) > + if not ok then > + log.warn('Could not drop old trigger from '.. > + '_bucket: %s', err) > + end > + end > + _bucket:on_replace(bucket_generation_increment) > + M.bucket_on_replace = bucket_generation_increment > +end > + > +local function schema_install_on_replace(_, new) > + -- Wait not just for _bucket to appear, but for the entire > + -- schema. This might be important if the schema will ever > + -- consist of more than just _bucket. > + if new[1] ~= 'vshard_version' then > + return > + end > + schema_install_triggers() > + > + local _schema = box.space._schema > + local ok, err = pcall(_schema.on_replace, _schema, nil, > M.schema_on_replace) > + if not ok then > + log.warn('Could not drop trigger from _schema inside of the '.. > + 'trigger: %s', err) > + end > + M.schema_on_replace = nil > + -- Drop the caches which might have been created while the > + -- schema was being replicated. > + bucket_generation_increment() > +end > + > +-- > +-- Install the triggers later when there is an actual schema to install > them on. > +-- On replicas it might happen that they are vshard-configured earlier > than the > +-- master and therefore don't have the schema right away. > +-- > +local function schema_install_triggers_delayed() > + log.info('Could not find _bucket space to install triggers - delayed > '.. > + 'until the schema is replicated') > + assert(not box.space._bucket) > + local _schema = box.space._schema > + if M.schema_on_replace then > + local ok, err = pcall(_schema.on_replace, _schema, nil, > + M.schema_on_replace) > + if not ok then > + log.warn('Could not drop trigger from _schema: %s', err) > + end > + end > + _schema:on_replace(schema_install_on_replace) > + M.schema_on_replace = schema_install_on_replace > +end > + > -- VShard versioning works in 4 numbers: major, minor, patch, and > -- a last helper number incremented on every schema change, if > -- first 3 numbers stay not changed. That happens when users take > @@ -2612,17 +2676,15 @@ local function storage_cfg(cfg, this_replica_uuid, > is_reload) > local uri = luri.parse(this_replica.uri) > schema_upgrade(is_master, uri.login, uri.password) > > - if M.bucket_on_replace then > - box.space._bucket:on_replace(nil, M.bucket_on_replace) > - M.bucket_on_replace = nil > - end > - lref.cfg() > - if is_master then > - box.space._bucket:on_replace(bucket_generation_increment) > - M.bucket_on_replace = bucket_generation_increment > + if is_master or box.space._bucket then > + schema_install_triggers() > +1 to Oleg, isn't _bucket check enough? > + else > + schema_install_triggers_delayed() > end > > + lref.cfg() > lsched.cfg(vshard_cfg) > + > lreplicaset.rebind_replicasets(new_replicasets, M.replicasets) > lreplicaset.outdate_replicasets(M.replicasets) > M.replicasets = new_replicasets > -- > 2.24.3 (Apple Git-128) > > P.S. I've tested it on Cartridge. Basic tests (luatest + WebUI inspection) seem to be OK. Best regards Yaroslav Dynnikov