* [Tarantool-patches] [PATCH vshard 0/2] vshard upgrade and _call @ 2020-03-21 18:59 Vladislav Shpilevoy 2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy Vladislav Shpilevoy 2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() Vladislav Shpilevoy 0 siblings, 2 replies; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-03-21 18:59 UTC (permalink / raw) To: tarantool-patches, olegrok, yaroslav.dynnikov The patchset introduces vshard.storage._call() API - one access point for all internal functions. It allows to hide them from the public usage behind one function, and to add/remove internal functions on read-only replicas using restart or hot code reload. The new call is going to be firstly used by the new discovery, which currently suffers from too long tx thread usage when fetches millions of buckets. The existing buckets_discovery() function can't be changed because it is public. So a new will be added inside _call(). To make it possible to introduce _call() and remove old functions the patchset describes and implements upgrade strategy, which was not really well thought through and developed until now. Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-227-service-call Issue: https://github.com/tarantool/tarantool/issues/227 Vladislav Shpilevoy (2): storage: introduce upgrade strategy storage: introduce vshard.storage._call() test/upgrade/box.lua | 9 ++ test/upgrade/storage_1_a.lua | 11 ++ test/upgrade/storage_1_b.lua | 1 + test/upgrade/storage_2_a.lua | 1 + test/upgrade/storage_2_b.lua | 1 + test/upgrade/suite.ini | 7 + test/upgrade/upgrade.result | 271 ++++++++++++++++++++++++++++++++++ test/upgrade/upgrade.test.lua | 97 ++++++++++++ vshard/storage/init.lua | 147 +++++++++++++++++- 9 files changed, 541 insertions(+), 4 deletions(-) create mode 100644 test/upgrade/box.lua create mode 100644 test/upgrade/storage_1_a.lua create mode 120000 test/upgrade/storage_1_b.lua create mode 120000 test/upgrade/storage_2_a.lua create mode 120000 test/upgrade/storage_2_b.lua create mode 100644 test/upgrade/suite.ini create mode 100644 test/upgrade/upgrade.result create mode 100644 test/upgrade/upgrade.test.lua -- 2.21.1 (Apple Git-122.3) ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy 2020-03-21 18:59 [Tarantool-patches] [PATCH vshard 0/2] vshard upgrade and _call Vladislav Shpilevoy @ 2020-03-21 18:59 ` Vladislav Shpilevoy 2020-03-22 5:05 ` Oleg Babin 2020-03-24 15:21 ` Yaroslav Dynnikov 2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() Vladislav Shpilevoy 1 sibling, 2 replies; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-03-21 18:59 UTC (permalink / raw) To: tarantool-patches, olegrok, yaroslav.dynnikov Since vshard creation it never needed any schema changes. Version of vshard schema was always '1', stored in _schema as 'oncevshard:storage:1', because it was created by box.once('vshard:storage:1'). Now it is time to make some changes to the schema. First is necessity to simplify introduction and removal of internal functions. There are 3 of them at the moment: bucket_recv()/ rebalancer_apply_routes()/rebalancer_request_state(). Users see them in vshard.storage.* namespace, and in _func space. They are going to be replaced by one vshard.storage._service_call(). In that way it will be easy to update them without schema changes, to add new functions, remove them. In _service_call() it will be possible to validate versions, return proper errors in case a requested function does not exist. Secondly, _service_call() is going to be used by new discovery on routers. It is not possible to modify the existing vshard.storage.buckets_discovery() because it is a pubic function. But a new discovery can be added to _service_call(). Thirdly, there is a bug in rebalancing related to possible big TCP delays, which requires a change of _bucket space format in future. Fourthly, _service_call() would allow to introduce new functions on read-only replica, using code reload only. This may be needed for the bug about bucket_ref() not preventing bucket move. The patch introduces versioning using 4 numbers: x.x.x.x. The first 3 numbers are major, minor, patch, the same as in the tags. The last value is a number increased when first 3 numbers can't be changed, but the schema is modified. That happens when users take master branch between 2 tags, and then the schema is changed again before the new tag is published. Upgrade between two adjacent tags is supposed to be always safe and automatic when reload or reconfiguration happen. However currently there are too few versions so as any upgrade could be not safe. Needed for #227 Needed for #210 --- test/upgrade/box.lua | 9 ++ test/upgrade/storage_1_a.lua | 11 ++ test/upgrade/storage_1_b.lua | 1 + test/upgrade/storage_2_a.lua | 1 + test/upgrade/storage_2_b.lua | 1 + test/upgrade/suite.ini | 7 + test/upgrade/upgrade.result | 259 ++++++++++++++++++++++++++++++++++ test/upgrade/upgrade.test.lua | 94 ++++++++++++ vshard/storage/init.lua | 121 +++++++++++++++- 9 files changed, 500 insertions(+), 4 deletions(-) create mode 100644 test/upgrade/box.lua create mode 100644 test/upgrade/storage_1_a.lua create mode 120000 test/upgrade/storage_1_b.lua create mode 120000 test/upgrade/storage_2_a.lua create mode 120000 test/upgrade/storage_2_b.lua create mode 100644 test/upgrade/suite.ini create mode 100644 test/upgrade/upgrade.result create mode 100644 test/upgrade/upgrade.test.lua diff --git a/test/upgrade/box.lua b/test/upgrade/box.lua new file mode 100644 index 0000000..ad0543a --- /dev/null +++ b/test/upgrade/box.lua @@ -0,0 +1,9 @@ +#!/usr/bin/env tarantool + +require('strict').on() + +box.cfg{ + listen = os.getenv("LISTEN"), +} + +require('console').listen(os.getenv('ADMIN')) diff --git a/test/upgrade/storage_1_a.lua b/test/upgrade/storage_1_a.lua new file mode 100644 index 0000000..a521dbc --- /dev/null +++ b/test/upgrade/storage_1_a.lua @@ -0,0 +1,11 @@ +#!/usr/bin/env tarantool +util = require('util') +NAME = require('fio').basename(arg[0], '.lua') +local source_path = arg[1] +if source_path then + -- Run one storage on a different vshard + -- version. + package.path = string.format('%s/?.lua;%s/?/init.lua;%s', source_path, + source_path, package.path) +end +require('storage_template') diff --git a/test/upgrade/storage_1_b.lua b/test/upgrade/storage_1_b.lua new file mode 120000 index 0000000..02572da --- /dev/null +++ b/test/upgrade/storage_1_b.lua @@ -0,0 +1 @@ +storage_1_a.lua \ No newline at end of file diff --git a/test/upgrade/storage_2_a.lua b/test/upgrade/storage_2_a.lua new file mode 120000 index 0000000..02572da --- /dev/null +++ b/test/upgrade/storage_2_a.lua @@ -0,0 +1 @@ +storage_1_a.lua \ No newline at end of file diff --git a/test/upgrade/storage_2_b.lua b/test/upgrade/storage_2_b.lua new file mode 120000 index 0000000..02572da --- /dev/null +++ b/test/upgrade/storage_2_b.lua @@ -0,0 +1 @@ +storage_1_a.lua \ No newline at end of file diff --git a/test/upgrade/suite.ini b/test/upgrade/suite.ini new file mode 100644 index 0000000..78efefe --- /dev/null +++ b/test/upgrade/suite.ini @@ -0,0 +1,7 @@ +[default] +core = tarantool +description = Upgrade tests +script = box.lua +is_parallel = False +lua_libs = ../lua_libs/storage_template.lua ../lua_libs/util.lua + ../lua_libs/git_util.lua ../../example/localcfg.lua diff --git a/test/upgrade/upgrade.result b/test/upgrade/upgrade.result new file mode 100644 index 0000000..0fb68c7 --- /dev/null +++ b/test/upgrade/upgrade.result @@ -0,0 +1,259 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... +git_util = require('git_util') + | --- + | ... +util = require('util') + | --- + | ... + +-- Commit "Improve compatibility with 1.9". +version_0_1_15_0 = '79a4dbfc4229e922cbfe4be259193a7b18dc089d' + | --- + | ... +vshard_copy_path = util.git_checkout('vshard_git_tree_copy_0_1_15_0', \ + version_0_1_15_0) + | --- + | ... + +REPLICASET_1 = { 'storage_1_a', 'storage_1_b' } + | --- + | ... +REPLICASET_2 = { 'storage_2_a', 'storage_2_b' } + | --- + | ... +test_run:create_cluster(REPLICASET_1, 'upgrade', {args = vshard_copy_path}) + | --- + | ... +test_run:create_cluster(REPLICASET_2, 'upgrade', {args = vshard_copy_path}) + | --- + | ... +util = require('util') + | --- + | ... +util.wait_master(test_run, REPLICASET_1, 'storage_1_a') + | --- + | ... +util.wait_master(test_run, REPLICASET_2, 'storage_2_a') + | --- + | ... +util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memtx\')') + | --- + | ... + +test_run:switch('storage_1_a') + | --- + | - true + | ... +box.space._schema:get({'oncevshard:storage:1'}) or box.space._schema:select() + | --- + | - ['oncevshard:storage:1'] + | ... +vshard.storage.internal.schema_current_version + | --- + | - null + | ... +vshard.storage.internal.schema_latest_version + | --- + | - null + | ... +bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + | --- + | ... +vshard.storage.bucket_force_create(1, bucket_count) + | --- + | - true + | ... +box.begin() \ +for i = 1, bucket_count do box.space.test:replace{i, i} end \ +box.commit() + | --- + | ... +box.space.test:count() + | --- + | - 1500 + | ... + +test_run:switch('storage_2_a') + | --- + | - true + | ... +box.space._schema:get({'oncevshard:storage:1'}) or box.space._schema:select() + | --- + | - ['oncevshard:storage:1'] + | ... +vshard.storage.internal.schema_current_version + | --- + | - null + | ... +vshard.storage.internal.schema_latest_version + | --- + | - null + | ... +bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + | --- + | ... +first_bucket = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1 + | --- + | ... +vshard.storage.bucket_force_create(first_bucket, bucket_count) + | --- + | - true + | ... +box.begin() \ +for i = first_bucket, first_bucket + bucket_count - 1 do \ + box.space.test:replace{i, i} \ +end \ +box.commit() + | --- + | ... +box.space.test:count() + | --- + | - 1500 + | ... + +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server storage_1_a') + | --- + | - true + | ... +test_run:cmd('start server storage_1_a') + | --- + | - true + | ... +test_run:cmd('stop server storage_1_b') + | --- + | - true + | ... +test_run:cmd('start server storage_1_b') + | --- + | - true + | ... + +test_run:switch('storage_1_a') + | --- + | - true + | ... +box.space._schema:get({'vshard_version'}) + | --- + | - ['vshard_version', 0, 1, 16, 0] + | ... +vshard.storage.internal.schema_current_version() + | --- + | - - 0 + | - 1 + | - 16 + | - 0 + | ... +vshard.storage.internal.schema_latest_version + | --- + | - - 0 + | - 1 + | - 16 + | - 0 + | ... + +test_run:switch('storage_1_b') + | --- + | - true + | ... +box.space._schema:get({'vshard_version'}) + | --- + | - ['vshard_version', 0, 1, 16, 0] + | ... +vshard.storage.internal.schema_current_version() + | --- + | - - 0 + | - 1 + | - 16 + | - 0 + | ... +vshard.storage.internal.schema_latest_version + | --- + | - - 0 + | - 1 + | - 16 + | - 0 + | ... + +test_run:switch('default') + | --- + | - true + | ... +-- Main purpose of the test - ensure that data can be safely moved +-- from an old instance to a newer one. Weight difference makes +-- rebalancer move the buckets from old storage_2 to new upgraded +-- storage_1. +util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, [[ \ + cfg.sharding[ util.replicasets[2] ].weight = 1 \ + cfg.sharding[ util.replicasets[1] ].weight = 2 \ + cfg.rebalancer_max_sending = 5 \ + vshard.storage.cfg(cfg, util.name_to_uuid[NAME]) \ +]]) + | --- + | ... + +test_run:switch('storage_2_a') + | --- + | - true + | ... +wait_rebalancer_state('The cluster is balanced ok', test_run) + | --- + | ... +active_count = 0 + | --- + | ... +index = box.space._bucket.index.status + | --- + | ... +for _, t in index:pairs({vshard.consts.BUCKET.ACTIVE}) do \ + active_count = active_count + 1 \ + assert(box.space.test:get({t.id}) ~= nil) \ +end + | --- + | ... +active_count + | --- + | - 1000 + | ... + +test_run:switch('storage_1_a') + | --- + | - true + | ... +active_count = 0 + | --- + | ... +index = box.space._bucket.index.status + | --- + | ... +for _, t in index:pairs({vshard.consts.BUCKET.ACTIVE}) do \ + active_count = active_count + 1 \ + assert(box.space.test:get({t.id}) ~= nil) \ +end + | --- + | ... +active_count + | --- + | - 2000 + | ... + +test_run:switch('default') + | --- + | - true + | ... +test_run:drop_cluster(REPLICASET_2) + | --- + | ... +test_run:drop_cluster(REPLICASET_1) + | --- + | ... +test_run:cmd('clear filter') + | --- + | - true + | ... diff --git a/test/upgrade/upgrade.test.lua b/test/upgrade/upgrade.test.lua new file mode 100644 index 0000000..3a4d113 --- /dev/null +++ b/test/upgrade/upgrade.test.lua @@ -0,0 +1,94 @@ +test_run = require('test_run').new() +git_util = require('git_util') +util = require('util') + +-- Commit "Improve compatibility with 1.9". +version_0_1_15_0 = '79a4dbfc4229e922cbfe4be259193a7b18dc089d' +vshard_copy_path = util.git_checkout('vshard_git_tree_copy_0_1_15_0', \ + version_0_1_15_0) + +REPLICASET_1 = { 'storage_1_a', 'storage_1_b' } +REPLICASET_2 = { 'storage_2_a', 'storage_2_b' } +test_run:create_cluster(REPLICASET_1, 'upgrade', {args = vshard_copy_path}) +test_run:create_cluster(REPLICASET_2, 'upgrade', {args = vshard_copy_path}) +util = require('util') +util.wait_master(test_run, REPLICASET_1, 'storage_1_a') +util.wait_master(test_run, REPLICASET_2, 'storage_2_a') +util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memtx\')') + +test_run:switch('storage_1_a') +box.space._schema:get({'oncevshard:storage:1'}) or box.space._schema:select() +vshard.storage.internal.schema_current_version +vshard.storage.internal.schema_latest_version +bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2 +vshard.storage.bucket_force_create(1, bucket_count) +box.begin() \ +for i = 1, bucket_count do box.space.test:replace{i, i} end \ +box.commit() +box.space.test:count() + +test_run:switch('storage_2_a') +box.space._schema:get({'oncevshard:storage:1'}) or box.space._schema:select() +vshard.storage.internal.schema_current_version +vshard.storage.internal.schema_latest_version +bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2 +first_bucket = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1 +vshard.storage.bucket_force_create(first_bucket, bucket_count) +box.begin() \ +for i = first_bucket, first_bucket + bucket_count - 1 do \ + box.space.test:replace{i, i} \ +end \ +box.commit() +box.space.test:count() + +test_run:switch('default') +test_run:cmd('stop server storage_1_a') +test_run:cmd('start server storage_1_a') +test_run:cmd('stop server storage_1_b') +test_run:cmd('start server storage_1_b') + +test_run:switch('storage_1_a') +box.space._schema:get({'vshard_version'}) +vshard.storage.internal.schema_current_version() +vshard.storage.internal.schema_latest_version + +test_run:switch('storage_1_b') +box.space._schema:get({'vshard_version'}) +vshard.storage.internal.schema_current_version() +vshard.storage.internal.schema_latest_version + +test_run:switch('default') +-- Main purpose of the test - ensure that data can be safely moved +-- from an old instance to a newer one. Weight difference makes +-- rebalancer move the buckets from old storage_2 to new upgraded +-- storage_1. +util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, [[ \ + cfg.sharding[ util.replicasets[2] ].weight = 1 \ + cfg.sharding[ util.replicasets[1] ].weight = 2 \ + cfg.rebalancer_max_sending = 5 \ + vshard.storage.cfg(cfg, util.name_to_uuid[NAME]) \ +]]) + +test_run:switch('storage_2_a') +wait_rebalancer_state('The cluster is balanced ok', test_run) +active_count = 0 +index = box.space._bucket.index.status +for _, t in index:pairs({vshard.consts.BUCKET.ACTIVE}) do \ + active_count = active_count + 1 \ + assert(box.space.test:get({t.id}) ~= nil) \ +end +active_count + +test_run:switch('storage_1_a') +active_count = 0 +index = box.space._bucket.index.status +for _, t in index:pairs({vshard.consts.BUCKET.ACTIVE}) do \ + active_count = active_count + 1 \ + assert(box.space.test:get({t.id}) ~= nil) \ +end +active_count + +test_run:switch('default') +test_run:drop_cluster(REPLICASET_2) +test_run:drop_cluster(REPLICASET_1) +test_run:cmd('clear filter') diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua index a86fcaf..a87a50b 100644 --- a/vshard/storage/init.lua +++ b/vshard/storage/init.lua @@ -257,7 +257,16 @@ end -------------------------------------------------------------------------------- -- Schema -------------------------------------------------------------------------------- -local function storage_schema_v1(username, password) + +-- VShard versioning works in 4 numbers: major, minor, patch, and +-- a last helper number incremented on every schema change, if +-- first 3 numbers stay not changed. That happens when users take +-- the latest master version not having a tag yet. They couldn't +-- upgrade if not the 4th number changed inside one tag. + +-- The schema first time appeared with 0.1.16. So this function +-- describes schema before that - 0.1.15. +local function schema_init_0_1_15_0(username, password) log.info("Initializing schema") box.schema.user.create(username, { password = password, @@ -297,7 +306,105 @@ local function storage_schema_v1(username, password) box.schema.user.grant(username, 'execute', 'function', name) end - box.snapshot() + box.space._schema:replace({'vshard_version', 0, 1, 15, 0}) +end + +local function schema_compare_versions(v1, v2) + for i = 1, 4 do + if v1[i] ~= v2[i] then + return v1[i] - v2[i] + end + end + return 0 +end + +local function schema_upgrade_to_0_1_16_0(username) + -- Since 0.1.16.0 the old versioning by + -- 'oncevshard:storage:<number>' is dropped because it is not + -- really extendible nor understandable. + log.info('Insert vshard_version into _schema') + box.space._schema:replace({'vshard_version', 0, 1, 16, 0}) + box.space._schema:delete({'oncevshard:storage:1'}) +end + +local function schema_current_version() + local version = box.space._schema:get({'vshard_version'}) + if version == nil then + version = {0, 1, 15, 0} + else + version = version:totable() + version = {version[2], version[3], version[4], version[5]} + end + return version +end + +local schema_latest_version = {0, 1, 16, 0} + +local function schema_upgrade_replica() + local version = schema_current_version() + -- Replica can't do upgrade - it is read-only. And it + -- shouldn't anyway - that would conflict with master doing + -- the same. So the upgrade is either non-critical, and the + -- replica can work with the new code but old schema. Or it + -- it is critical, and need to wait the schema upgrade from + -- the master. + -- Or it may happen, that the upgrade just is not possible. + -- For example, when an auto-upgrade tries to change a too old + -- schema to the newest, skipping some intermediate versions. + -- For example, from 1.2.3.4 to 1.7.8.9, when it is assumed + -- that a safe upgrade should go 1.2.3.4 -> 1.2.4.1 -> + -- 1.3.1.1 and so on step by step. + if schema_compare_versions(version, schema_latest_version) == 0 then + log.info('Replica\'s vshard schema version is up to date - %s', + table.concat(version, '.')) + else + log.info('Replica\' vshard schema version is outdated - current '.. + '%s vs latest %s, but the replica still can work', + table.concat(version, '.'), + table.concat(schema_latest_version, '.')) + end + -- In future for hard changes the replica may be suspended + -- until its schema is synced with master. Or it may + -- reject to upgrade in case of incompatible changes. Now + -- there are too few versions so as such problems could + -- appear. +end + +local function schema_upgrade_master(username, password) + local _schema = box.space._schema + local is_old_versioning = _schema:get({'oncevshard:storage:1'}) ~= nil + local version = schema_current_version() + local is_bootstrap = not box.space._bucket + + if is_bootstrap then + schema_init_0_1_15_0(username, password) + elseif is_old_versioning then + log.info('The instance does not have vshard_version record. '.. + 'It is 0.1.15.0.') + end + local handlers = { + {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0}, + } + assert(schema_compare_versions(handlers[#handlers].version, + schema_latest_version) == 0) + for _, handler in ipairs(handlers) do + local next_version = handler.version + if schema_compare_versions(next_version, version) > 0 then + local next_version_str = table.concat(next_version, '.') + log.info("Upgrade vshard to {%s}", next_version_str) + handler.func(username, password) + log.info("Successful vshard upgrade to {%s}", next_version_str) + _schema:replace({'vshard_version', unpack(next_version)}) + end + end +end + +local function schema_upgrade(is_master, username, password) + if is_master then + return schema_upgrade_master(username, password) + else + return schema_upgrade_replica() + end end local function this_is_master() @@ -2169,8 +2276,12 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload) error(err) end log.info("Box has been configured") - local uri = luri.parse(this_replica.uri) - box.once("vshard:storage:1", storage_schema_v1, uri.login, uri.password) + end + + local uri = luri.parse(this_replica.uri) + schema_upgrade(is_master, uri.login, uri.password) + + if not is_reload then box.space._bucket:on_replace(bucket_generation_increment) else local old = box.space._bucket:on_replace()[1] @@ -2469,6 +2580,8 @@ M.rlist = { add_tail = rlist_add_tail, remove = rlist_remove, } +M.schema_latest_version = schema_latest_version +M.schema_current_version = schema_current_version return { sync = sync, -- 2.21.1 (Apple Git-122.3) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy 2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy Vladislav Shpilevoy @ 2020-03-22 5:05 ` Oleg Babin 2020-03-22 19:12 ` Vladislav Shpilevoy 2020-03-24 15:21 ` Yaroslav Dynnikov 1 sibling, 1 reply; 14+ messages in thread From: Oleg Babin @ 2020-03-22 5:05 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov Hi! Thanks for your patch! I have several quite minor comments. On 21/03/2020 21:59, Vladislav Shpilevoy wrote: > + > +local function schema_upgrade_to_0_1_16_0(username) > + -- Since 0.1.16.0 the old versioning by > + -- 'oncevshard:storage:<number>' is dropped because it is not > + -- really extendible nor understandable. > + log.info('Insert vshard_version into _schema') > + box.space._schema:replace({'vshard_version', 0, 1, 16, 0}) > + box.space._schema:delete({'oncevshard:storage:1'}) > +end I think it's better to wrap all upgrades into transaction or may be the whole upgrade procedure below. > +local function schema_current_version() > + local version = box.space._schema:get({'vshard_version'}) > + if version == nil then > + version = {0, 1, 15, 0} > + else > + version = version:totable() > + version = {version[2], version[3], version[4], version[5]} You could use {version:unpack(2)} here. However feel free to ignore this comment. > + end > + return version > +end > + > +local schema_latest_version = {0, 1, 16, 0} > + > +local function schema_upgrade_replica() > + local version = schema_current_version() > + -- Replica can't do upgrade - it is read-only. And it > + -- shouldn't anyway - that would conflict with master doing > + -- the same. So the upgrade is either non-critical, and the > + -- replica can work with the new code but old schema. Or it > + -- it is critical, and need to wait the schema upgrade from > + -- the master. > + -- Or it may happen, that the upgrade just is not possible. > + -- For example, when an auto-upgrade tries to change a too old > + -- schema to the newest, skipping some intermediate versions. > + -- For example, from 1.2.3.4 to 1.7.8.9, when it is assumed > + -- that a safe upgrade should go 1.2.3.4 -> 1.2.4.1 -> > + -- 1.3.1.1 and so on step by step. > + if schema_compare_versions(version, schema_latest_version) == 0 then > + log.info('Replica\'s vshard schema version is up to date - %s', > + table.concat(version, '.')) You don't have such messages on a master (that current schema is the latest). I think the log level here should be "verbose". > + else > + log.info('Replica\' vshard schema version is outdated - current '.. > + '%s vs latest %s, but the replica still can work', > + table.concat(version, '.'), > + table.concat(schema_latest_version, '.')) > + end > + -- In future for hard changes the replica may be suspended > + -- until its schema is synced with master. Or it may > + -- reject to upgrade in case of incompatible changes. Now > + -- there are too few versions so as such problems could > + -- appear. > +end > + > +local function schema_upgrade_master(username, password) > + local _schema = box.space._schema > + local is_old_versioning = _schema:get({'oncevshard:storage:1'}) ~= nil > + local version = schema_current_version() > + local is_bootstrap = not box.space._bucket > + > + if is_bootstrap then > + schema_init_0_1_15_0(username, password) > + elseif is_old_versioning then > + log.info('The instance does not have vshard_version record. '.. > + 'It is 0.1.15.0.') > + end > + local handlers = { > + {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0}, > + } > + assert(schema_compare_versions(handlers[#handlers].version, > + schema_latest_version) == 0) > + for _, handler in ipairs(handlers) do > + local next_version = handler.version > + if schema_compare_versions(next_version, version) > 0 then > + local next_version_str = table.concat(next_version, '.') > + log.info("Upgrade vshard to {%s}", next_version_str) > + handler.func(username, password) I think it will be good if you wrap it into pcall and add possible error handling. I see that all changes quite trivial but there is no guarantee that it will be in future. Ideally, if you rollback all already applied changes. This is an opportunity for user to downgrade module to previous version and report about a bug without any inconsistencies after upgrade. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy 2020-03-22 5:05 ` Oleg Babin @ 2020-03-22 19:12 ` Vladislav Shpilevoy 2020-03-23 6:35 ` Oleg Babin 0 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-03-22 19:12 UTC (permalink / raw) To: Oleg Babin, tarantool-patches, yaroslav.dynnikov Hi! Thanks for the review! > On 21/03/2020 21:59, Vladislav Shpilevoy wrote: > >> + >> +local function schema_upgrade_to_0_1_16_0(username) >> + -- Since 0.1.16.0 the old versioning by >> + -- 'oncevshard:storage:<number>' is dropped because it is not >> + -- really extendible nor understandable. >> + log.info('Insert vshard_version into _schema') >> + box.space._schema:replace({'vshard_version', 0, 1, 16, 0}) >> + box.space._schema:delete({'oncevshard:storage:1'}) >> +end > > I think it's better to wrap all upgrades into transaction or may be the whole upgrade procedure below. Ok, see my comment in the end of the email. >> +local function schema_current_version() >> + local version = box.space._schema:get({'vshard_version'}) >> + if version == nil then >> + version = {0, 1, 15, 0} >> + else >> + version = version:totable() >> + version = {version[2], version[3], version[4], version[5]} > > You could use {version:unpack(2)} here. However feel free to ignore this comment. Cool, didn't know unpack/totable has parameters. ==================== local function schema_current_version() local version = box.space._schema:get({'vshard_version'}) if version == nil then - version = {0, 1, 15, 0} + return {0, 1, 15, 0} else - version = version:totable() - version = {version[2], version[3], version[4], version[5]} + return version:totable(2) end - return version end ==================== >> + end >> + return version >> +end >> + >> +local schema_latest_version = {0, 1, 16, 0} >> + >> +local function schema_upgrade_replica() >> + local version = schema_current_version() >> + -- Replica can't do upgrade - it is read-only. And it >> + -- shouldn't anyway - that would conflict with master doing >> + -- the same. So the upgrade is either non-critical, and the >> + -- replica can work with the new code but old schema. Or it >> + -- it is critical, and need to wait the schema upgrade from >> + -- the master. >> + -- Or it may happen, that the upgrade just is not possible. >> + -- For example, when an auto-upgrade tries to change a too old >> + -- schema to the newest, skipping some intermediate versions. >> + -- For example, from 1.2.3.4 to 1.7.8.9, when it is assumed >> + -- that a safe upgrade should go 1.2.3.4 -> 1.2.4.1 -> >> + -- 1.3.1.1 and so on step by step. >> + if schema_compare_versions(version, schema_latest_version) == 0 then >> + log.info('Replica\'s vshard schema version is up to date - %s', >> + table.concat(version, '.')) > > You don't have such messages on a master (that current schema is the latest). I think the log level here should be "verbose". I better drop this message. Not to spam into the log. >> + else >> + log.info('Replica\' vshard schema version is outdated - current '.. >> + '%s vs latest %s, but the replica still can work', >> + table.concat(version, '.'), >> + table.concat(schema_latest_version, '.')) >> + end >> + -- In future for hard changes the replica may be suspended >> + -- until its schema is synced with master. Or it may >> + -- reject to upgrade in case of incompatible changes. Now >> + -- there are too few versions so as such problems could >> + -- appear. >> +end >> + >> +local function schema_upgrade_master(username, password) >> + local _schema = box.space._schema >> + local is_old_versioning = _schema:get({'oncevshard:storage:1'}) ~= nil >> + local version = schema_current_version() >> + local is_bootstrap = not box.space._bucket >> + >> + if is_bootstrap then >> + schema_init_0_1_15_0(username, password) >> + elseif is_old_versioning then >> + log.info('The instance does not have vshard_version record. '.. >> + 'It is 0.1.15.0.') >> + end >> + local handlers = { >> + {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0}, >> + } >> + assert(schema_compare_versions(handlers[#handlers].version, >> + schema_latest_version) == 0) >> + for _, handler in ipairs(handlers) do >> + local next_version = handler.version >> + if schema_compare_versions(next_version, version) > 0 then >> + local next_version_str = table.concat(next_version, '.') >> + log.info("Upgrade vshard to {%s}", next_version_str) >> + handler.func(username, password) > > I think it will be good if you wrap it into pcall and add possible error handling. I see that all changes quite trivial but there is no guarantee that it will be in future. Ideally, if you rollback all already applied changes. This is an opportunity for user to downgrade module to previous version and report about a bug without any inconsistencies after upgrade. I tried. See diff below. Absolutely atomic non-yield upgrade is not really possible. Some future upgrade may require change of _bucket involving thousands of tuples, or both DDL + DML change. But for 0.1.16.0 the atomicity is possible, I added box.begin/commit/rollback. Also as you can see, I left begin/commit inside upgrade handler, so each version upgrade may decide in its handler when start a transaction, how many tuples to commit at once, how to rollback in case the upgrade consists of several transactions. Besides, upgrade is atomic only in scope of one version. If you upgrade from ver1 to ver2, and then fail to upgrade from ver2 to ver3, only ver3 is rolled back. Otherwise it would be too complicated to add 'downgrade' handler for each version, and support it forever. With current way you can reload/restart with ver2's code, fix problems, and try ver3 again. However this is hard to tell, since formally we have now only ver1 and ver2. Besides, 1) I moved handler array to a global variable so as not to create the array on each storage.cfg(); 2) I added os.exit() in case upgrade succeeded, but version bump in _schema failed. Because downgrade can't be done already, and on the other hand it is not acceptable to work with the newer schema, but an older version. I didn't trust version bump to handlers since it will be forgotten someday for sure. (Bump happens in 0.1.16.0 handler, but only because it is a part of schema change). ==================== diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua index a20bb21..c0e768a 100644 --- a/vshard/storage/init.lua +++ b/vshard/storage/init.lua @@ -323,8 +323,10 @@ local function schema_upgrade_to_0_1_16_0(username) -- 'oncevshard:storage:<number>' is dropped because it is not -- really extendible nor understandable. log.info('Insert vshard_version into _schema') + box.begin() box.space._schema:replace({'vshard_version', 0, 1, 16, 0}) box.space._schema:delete({'oncevshard:storage:1'}) + box.commit() end local function schema_current_version() @@ -365,6 +367,16 @@ local function schema_upgrade_replica() -- appear. end +-- Every handler should be atomic. It is either applied whole, or +-- not applied at all. Atomic upgrade helps to downgrade in case +-- something goes wrong. At least by doing restart with the latest +-- successfully applied version. However, atomicity does not +-- prohibit yields, in case the upgrade, for example, affects huge +-- number of tuples (_bucket records, maybe). +local schema_upgrade_handlers = { + {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0}, +} + local function schema_upgrade_master(username, password) local _schema = box.space._schema local is_old_versioning = _schema:get({'oncevshard:storage:1'}) ~= nil @@ -377,19 +389,28 @@ local function schema_upgrade_master(username, password) log.info('The instance does not have vshard_version record. '.. 'It is 0.1.15.0.') end - local handlers = { - {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0}, - } - assert(schema_compare_versions(handlers[#handlers].version, + local handler_latest_version = + schema_upgrade_handlers[#schema_upgrade_handlers].version + assert(schema_compare_versions(handler_latest_version, schema_latest_version) == 0) - for _, handler in ipairs(handlers) do + for _, handler in ipairs(schema_upgrade_handlers) do local next_version = handler.version if schema_compare_versions(next_version, version) > 0 then local next_version_str = table.concat(next_version, '.') log.info("Upgrade vshard to {%s}", next_version_str) - handler.func(username, password) + local ok, err = pcall(handler.func, username, password) + if not ok then + log.info("Couldn't upgrade to {%s}: %s", next_version_str, err) + error(err) + end log.info("Successful vshard upgrade to {%s}", next_version_str) - _schema:replace({'vshard_version', unpack(next_version)}) + ok, err = pcall(_schema.replace, _schema, + {'vshard_version', unpack(next_version)}) + if not ok then + log.info("Upgraded to {%s} but couldn't update _schema ".. + "'vshard_version' - fatal error: %s", err) + os.exit(-1) + end end end end ==================== ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy 2020-03-22 19:12 ` Vladislav Shpilevoy @ 2020-03-23 6:35 ` Oleg Babin 2020-03-23 22:32 ` Vladislav Shpilevoy 0 siblings, 1 reply; 14+ messages in thread From: Oleg Babin @ 2020-03-23 6:35 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov Hi! Thanks for the changes! One nit and LGTM. On 22/03/2020 22:12, Vladislav Shpilevoy wrote: > But for 0.1.16.0 the atomicity is possible, I added > box.begin/commit/rollback. > > - for _, handler in ipairs(handlers) do > + for _, handler in ipairs(schema_upgrade_handlers) do > local next_version = handler.version > if schema_compare_versions(next_version, version) > 0 then > local next_version_str = table.concat(next_version, '.') > log.info("Upgrade vshard to {%s}", next_version_str) > - handler.func(username, password) > + local ok, err = pcall(handler.func, username, password) > + if not ok then > + log.info("Couldn't upgrade to {%s}: %s", next_version_str, err) Seems that you've missed box.rollback here. And I think that it should be something like "if box.is_in_txn() then box.rollback() end" I'm not sure but is it possible to drop last xlogs for user and start to use previous version of schema (and vshard version as well) as workaround in case of "fatal error"? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy 2020-03-23 6:35 ` Oleg Babin @ 2020-03-23 22:32 ` Vladislav Shpilevoy 2020-03-24 4:32 ` Oleg Babin 0 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-03-23 22:32 UTC (permalink / raw) To: Oleg Babin, tarantool-patches, yaroslav.dynnikov Hi! Thanks for the review! On 23/03/2020 07:35, Oleg Babin wrote: > On 22/03/2020 22:12, Vladislav Shpilevoy wrote: >> But for 0.1.16.0 the atomicity is possible, I added >> box.begin/commit/rollback. >> >> - for _, handler in ipairs(handlers) do >> + for _, handler in ipairs(schema_upgrade_handlers) do >> local next_version = handler.version >> if schema_compare_versions(next_version, version) > 0 then >> local next_version_str = table.concat(next_version, '.') >> log.info("Upgrade vshard to {%s}", next_version_str) >> - handler.func(username, password) >> + local ok, err = pcall(handler.func, username, password) >> + if not ok then >> + log.info("Couldn't upgrade to {%s}: %s", next_version_str, err) > > Seems that you've missed box.rollback here. And I think that it should be something like "if box.is_in_txn() then box.rollback() end" Yes, you are right. box.is_in_txn() is not needed here. box.rollback() is no-op in case there is no a transaction. ==================== diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua index c0e768a..630d1ff 100644 --- a/vshard/storage/init.lua +++ b/vshard/storage/init.lua @@ -400,6 +400,7 @@ local function schema_upgrade_master(username, password) log.info("Upgrade vshard to {%s}", next_version_str) local ok, err = pcall(handler.func, username, password) if not ok then + box.rollback() log.info("Couldn't upgrade to {%s}: %s", next_version_str, err) error(err) end ==================== > I'm not sure but is it possible to drop last xlogs for user and start to use previous version of schema (and vshard version as well) as workaround in case of "fatal error"? Even if we will try to drop latest xlog files, it is not really a solution, because 1) These xlogs can contain not only upgrade changes. An xlog file could already exist at the moment when someone did reload with a newer version; 2) During xlog deletion there may appear another fatal error. Any unlink() or close() can fail; 3) Xlog may be already sent by replication; 4) It is not possible to rollback code of vshard to the previous version automatically. Someone needs to do it manually using code reload or restart. VShard does not have previous code version, so it can't checkout itself to an older version (except in tests, where I now that git is available, installed, and it is safe to checkout). In case you upgrade via restart, you should restart again with older version. In case you upgrade via reload, you should just stop reload, and put the old vshard.storage object back into _G. All should be safe in case upgrade is atomic. You can argue that we can do a snapshot before upgrade to solve the first problem, but snapshot is a disk killer if it is called automatically on several very fat instances on the same machine. I am here again trying to make things easier for the cloud team, who never restart instances unless it is unavoidable. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy 2020-03-23 22:32 ` Vladislav Shpilevoy @ 2020-03-24 4:32 ` Oleg Babin 0 siblings, 0 replies; 14+ messages in thread From: Oleg Babin @ 2020-03-24 4:32 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov Thanks for answers and explanation. Both patches LGTM. On 24/03/2020 01:32, Vladislav Shpilevoy wrote: > Hi! Thanks for the review! > > On 23/03/2020 07:35, Oleg Babin wrote: >> On 22/03/2020 22:12, Vladislav Shpilevoy wrote: >>> But for 0.1.16.0 the atomicity is possible, I added >>> box.begin/commit/rollback. >>> >>> - for _, handler in ipairs(handlers) do >>> + for _, handler in ipairs(schema_upgrade_handlers) do >>> local next_version = handler.version >>> if schema_compare_versions(next_version, version) > 0 then >>> local next_version_str = table.concat(next_version, '.') >>> log.info("Upgrade vshard to {%s}", next_version_str) >>> - handler.func(username, password) >>> + local ok, err = pcall(handler.func, username, password) >>> + if not ok then >>> + log.info("Couldn't upgrade to {%s}: %s", next_version_str, err) >> >> Seems that you've missed box.rollback here. And I think that it should be something like "if box.is_in_txn() then box.rollback() end" > > Yes, you are right. > > box.is_in_txn() is not needed here. box.rollback() is no-op in case > there is no a transaction. > > ==================== > diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua > index c0e768a..630d1ff 100644 > --- a/vshard/storage/init.lua > +++ b/vshard/storage/init.lua > @@ -400,6 +400,7 @@ local function schema_upgrade_master(username, password) > log.info("Upgrade vshard to {%s}", next_version_str) > local ok, err = pcall(handler.func, username, password) > if not ok then > + box.rollback() > log.info("Couldn't upgrade to {%s}: %s", next_version_str, err) > error(err) > end > ==================== > >> I'm not sure but is it possible to drop last xlogs for user and start to use previous version of schema (and vshard version as well) as workaround in case of "fatal error"? > > Even if we will try to drop latest xlog files, it is not really > a solution, because > > 1) These xlogs can contain not only upgrade changes. An xlog file > could already exist at the moment when someone did reload with > a newer version; > > 2) During xlog deletion there may appear another fatal error. Any > unlink() or close() can fail; > > 3) Xlog may be already sent by replication; > > 4) It is not possible to rollback code of vshard to the previous > version automatically. Someone needs to do it manually using code > reload or restart. VShard does not have previous code version, so > it can't checkout itself to an older version (except in tests, > where I now that git is available, installed, and it is safe to > checkout). In case you upgrade via restart, you should restart again > with older version. In case you upgrade via reload, you should just > stop reload, and put the old vshard.storage object back into _G. All > should be safe in case upgrade is atomic. > > You can argue that we can do a snapshot before upgrade to solve the first > problem, but snapshot is a disk killer if it is called automatically > on several very fat instances on the same machine. I am here again > trying to make things easier for the cloud team, who never restart instances > unless it is unavoidable. > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy 2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy Vladislav Shpilevoy 2020-03-22 5:05 ` Oleg Babin @ 2020-03-24 15:21 ` Yaroslav Dynnikov 2020-03-24 23:44 ` Vladislav Shpilevoy 1 sibling, 1 reply; 14+ messages in thread From: Yaroslav Dynnikov @ 2020-03-24 15:21 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: yaroslav.dynnikov, tarantool-patches [-- Attachment #1: Type: text/plain, Size: 1974 bytes --] Hi, Vlad. I've run my cartridge upgrade tests on your branch (6e50e26c), and it's ok on tarantool 2.2, but fails for 1.10: ``` replica | ApplyConfigError: Space _schema does not support multi-statement transactions replica | stack traceback: replica | ...cartridge/.rocks/share/tarantool/vshard/storage/init.lua:419: in function 'schema_upgrade' replica | ...cartridge/.rocks/share/tarantool/vshard/storage/init.lua:2336: in function 'cfg' ``` Here are results from Gitlab CI: https://gitlab.com/tarantool/cartridge/pipelines/129256300 And here is one more remark about the patch itself. On Sat, 21 Mar 2020 at 21:59, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> wrote: > > local function this_is_master() > @@ -2169,8 +2276,12 @@ local function storage_cfg(cfg, this_replica_uuid, > is_reload) > error(err) > end > log.info("Box has been configured") > - local uri = luri.parse(this_replica.uri) > - box.once("vshard:storage:1", storage_schema_v1, uri.login, > uri.password) > + end > + > + local uri = luri.parse(this_replica.uri) > + schema_upgrade(is_master, uri.login, uri.password) > + > + if not is_reload then > It seems like this `if/else` statement isn't necessary. The `else` branch is enough for both cases. Even it's not hot reload it would result in `local old = nil; box.space._bucket:on_replace(new_trigger, nil)` which is essentially the same > box.space._bucket:on_replace(bucket_generation_increment) > else > local old = box.space._bucket:on_replace()[1] > @@ -2469,6 +2580,8 @@ M.rlist = { > add_tail = rlist_add_tail, > remove = rlist_remove, > } > +M.schema_latest_version = schema_latest_version > +M.schema_current_version = schema_current_version > > return { > sync = sync, > -- > 2.21.1 (Apple Git-122.3) > > -- С уважением. Дынников Ярослав. [-- Attachment #2: Type: text/html, Size: 3976 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy 2020-03-24 15:21 ` Yaroslav Dynnikov @ 2020-03-24 23:44 ` Vladislav Shpilevoy 0 siblings, 0 replies; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-03-24 23:44 UTC (permalink / raw) To: Yaroslav Dynnikov; +Cc: tarantool-patches Hi! I reworked the patch. The diff is big, so I will send a second version. Below I also provide diff. On 24/03/2020 16:21, Yaroslav Dynnikov wrote: > Hi, Vlad. > > I've run my cartridge upgrade tests on your branch (6e50e26c), and it's ok on tarantool 2.2, but fails for 1.10: > ``` > | > replica | ApplyConfigError: Space _schema does not support multi-statement transactions > || > replica | stack traceback: > || > replica | ...cartridge/.rocks/share/tarantool/vshard/storage/init.lua:419: in function 'schema_upgrade' > ||replica | ...cartridge/.rocks/share/tarantool/vshard/storage/init.lua:2336: in function 'cfg'| > |```| > | > | > Here are results from Gitlab CI: https://gitlab.com/tarantool/cartridge/pipelines/129256300 > > And here is one more remark about the patch itself. > > On Sat, 21 Mar 2020 at 21:59, Vladislav Shpilevoy <v.shpilevoy@tarantool.org <mailto:v.shpilevoy@tarantool.org>> wrote: > > > local function this_is_master() > @@ -2169,8 +2276,12 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload) > error(err) > end > log.info <http://log.info>("Box has been configured") > - local uri = luri.parse(this_replica.uri) > - box.once("vshard:storage:1", storage_schema_v1, uri.login, uri.password) > + end > + > + local uri = luri.parse(this_replica.uri) > + schema_upgrade(is_master, uri.login, uri.password) > + > + if not is_reload then > > > It seems like this `if/else` statement isn't necessary. The `else` branch is enough for both cases. > Even it's not hot reload it would result in `local old = nil; box.space._bucket:on_replace(new_trigger, nil)` which is essentially the same Agree, added to the patch. ==================== diff --git a/test/unit/box2.lua b/test/unit/box2.lua new file mode 120000 index 0000000..1347390 --- /dev/null +++ b/test/unit/box2.lua @@ -0,0 +1 @@ +box.lua \ No newline at end of file diff --git a/test/unit/upgrade.result b/test/unit/upgrade.result new file mode 100644 index 0000000..5530746 --- /dev/null +++ b/test/unit/upgrade.result @@ -0,0 +1,218 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... + +-- There is no way to revert a bootstrapped storage manually. The +-- test is done in a new instance which is killed with all its +-- data afterwards. + +_ = test_run:cmd("create server newdefault with script='unit/box2.lua'") + | --- + | ... +_ = test_run:cmd("start server newdefault") + | --- + | ... +_ = test_run:switch("newdefault") + | --- + | ... + +util = require('util') + | --- + | ... + +vshard = require('vshard') + | --- + | ... + +upgrade = vshard.storage.internal.schema_upgrade_master + | --- + | ... +handlers = vshard.storage.internal.schema_upgrade_handlers + | --- + | ... +version_make = vshard.storage.internal.schema_version_make + | --- + | ... +curr_version = vshard.storage.internal.schema_current_version + | --- + | ... +schema_bootstrap = vshard.storage.internal.schema_bootstrap + | --- + | ... + +user = 'storage' + | --- + | ... +password = 'storage' + | --- + | ... + +schema_bootstrap(user, password) + | --- + | ... + +version_make({0, 1, 16, 0}) + | --- + | - '{0.1.16.0}' + | ... + +versions = {} + | --- + | ... + +_ = test_run:cmd("setopt delimiter ';'") + | --- + | ... +for v1 = 1, 3 do + for v2 = 1, 3 do + for v3 = 1, 3 do + for v4 = 1, 3 do + table.insert(versions, version_make({v1, v2, v3, v4})) + end + end + end +end; + | --- + | ... + +err = nil; + | --- + | ... +count = 0; + | --- + | ... +for i, v1 in pairs(versions) do + for j, v2 in pairs(versions) do + local v1n = string.format("%s.%s.%s.%s", v1[1], v1[2], v1[3], v1[4]) + local v2n = string.format("%s.%s.%s.%s", v2[1], v2[2], v2[3], v2[4]) + count = count + 1 + if ((v1 == v2) ~= (v1n == v2n)) or ((v1 ~= v2) ~= (v1n ~= v2n)) or + ((v1 <= v2) ~= (v1n <= v2n)) or ((v1 < v2) ~= (v1n < v2n)) or + ((v1 >= v2) ~= (v1n >= v2n)) or ((v1 > v2) ~= (v1n > v2n)) then + err = {i, j} + break + end + end + if err then + break + end +end; + | --- + | ... +err; + | --- + | - null + | ... +count; + | --- + | - 6561 + | ... + +function stat_collect() + return { + func_count = box.space._func:count(), + space_count = box.space._space:count(), + user_count = box.space._user:count(), + schema_count = box.space._schema:count(), + version = curr_version(), + } +end; + | --- + | ... + +function stat_diff(s1, s2) + local res = {} + if s2.func_count ~= s1.func_count then + res.func_count = s2.func_count - s1.func_count + end + if s2.space_count ~= s1.space_count then + res.space_count = s2.space_count - s1.space_count + end + if s2.user_count ~= s1.user_count then + res.user_count = s2.user_count - s1.user_count + end + if s2.schema_count ~= s1.schema_count then + res.schema_count = s2.schema_count - s1.schema_count + end + if s1.version ~= s2.version then + res.old_version = s1.version + res.new_version = s2.version + end + return res +end; + | --- + | ... + +stat = stat_collect(); + | --- + | ... + +function stat_update() + local new_stat = stat_collect() + local diff = stat_diff(stat, new_stat) + stat = new_stat + return diff +end; + | --- + | ... + +upgrade_trace = {}; + | --- + | ... +errinj = vshard.storage.internal.errinj; + | --- + | ... + +for _, handler in pairs(handlers) do + table.insert(upgrade_trace, string.format('Upgrade to %s', handler.version)) + + table.insert(upgrade_trace, 'Errinj in begin') + errinj.ERRINJ_UPGRADE = 'begin' + table.insert(upgrade_trace, + {util.check_error(upgrade, handler.version, user, password)}) + table.insert(upgrade_trace, {diff = stat_update()}) + + table.insert(upgrade_trace, 'Errinj in end') + errinj.ERRINJ_UPGRADE = 'end' + table.insert(upgrade_trace, + {util.check_error(upgrade, handler.version, user, password)}) + table.insert(upgrade_trace, {diff = stat_update()}) + + table.insert(upgrade_trace, 'No errinj') + errinj.ERRINJ_UPGRADE = nil + table.insert(upgrade_trace, + {pcall(upgrade, handler.version, user, password)}) + table.insert(upgrade_trace, {diff = stat_update()}) +end; + | --- + | ... + +_ = test_run:cmd("setopt delimiter ''"); + | --- + | ... + +upgrade_trace + | --- + | - - Upgrade to {0.1.16.0} + | - Errinj in begin + | - - Errinj in begin + | - diff: [] + | - Errinj in end + | - - Errinj in end + | - diff: [] + | - No errinj + | - - true + | - diff: + | new_version: '{0.1.16.0}' + | old_version: '{0.1.15.0}' + | ... + +_ = test_run:switch("default") + | --- + | ... +_ = test_run:cmd("stop server newdefault") + | --- + | ... +_ = test_run:cmd("cleanup server newdefault") + | --- + | ... +_ = test_run:cmd("delete server newdefault") + | --- + | ... diff --git a/test/unit/upgrade.test.lua b/test/unit/upgrade.test.lua new file mode 100644 index 0000000..9bbfd84 --- /dev/null +++ b/test/unit/upgrade.test.lua @@ -0,0 +1,133 @@ +test_run = require('test_run').new() + +-- There is no way to revert a bootstrapped storage manually. The +-- test is done in a new instance which is killed with all its +-- data afterwards. + +_ = test_run:cmd("create server newdefault with script='unit/box2.lua'") +_ = test_run:cmd("start server newdefault") +_ = test_run:switch("newdefault") + +util = require('util') + +vshard = require('vshard') + +upgrade = vshard.storage.internal.schema_upgrade_master +handlers = vshard.storage.internal.schema_upgrade_handlers +version_make = vshard.storage.internal.schema_version_make +curr_version = vshard.storage.internal.schema_current_version +schema_bootstrap = vshard.storage.internal.schema_bootstrap + +user = 'storage' +password = 'storage' + +schema_bootstrap(user, password) + +version_make({0, 1, 16, 0}) + +versions = {} + +_ = test_run:cmd("setopt delimiter ';'") +for v1 = 1, 3 do + for v2 = 1, 3 do + for v3 = 1, 3 do + for v4 = 1, 3 do + table.insert(versions, version_make({v1, v2, v3, v4})) + end + end + end +end; + +err = nil; +count = 0; +for i, v1 in pairs(versions) do + for j, v2 in pairs(versions) do + local v1n = string.format("%s.%s.%s.%s", v1[1], v1[2], v1[3], v1[4]) + local v2n = string.format("%s.%s.%s.%s", v2[1], v2[2], v2[3], v2[4]) + count = count + 1 + if ((v1 == v2) ~= (v1n == v2n)) or ((v1 ~= v2) ~= (v1n ~= v2n)) or + ((v1 <= v2) ~= (v1n <= v2n)) or ((v1 < v2) ~= (v1n < v2n)) or + ((v1 >= v2) ~= (v1n >= v2n)) or ((v1 > v2) ~= (v1n > v2n)) then + err = {i, j} + break + end + end + if err then + break + end +end; +err; +count; + +function stat_collect() + return { + func_count = box.space._func:count(), + space_count = box.space._space:count(), + user_count = box.space._user:count(), + schema_count = box.space._schema:count(), + version = curr_version(), + } +end; + +function stat_diff(s1, s2) + local res = {} + if s2.func_count ~= s1.func_count then + res.func_count = s2.func_count - s1.func_count + end + if s2.space_count ~= s1.space_count then + res.space_count = s2.space_count - s1.space_count + end + if s2.user_count ~= s1.user_count then + res.user_count = s2.user_count - s1.user_count + end + if s2.schema_count ~= s1.schema_count then + res.schema_count = s2.schema_count - s1.schema_count + end + if s1.version ~= s2.version then + res.old_version = s1.version + res.new_version = s2.version + end + return res +end; + +stat = stat_collect(); + +function stat_update() + local new_stat = stat_collect() + local diff = stat_diff(stat, new_stat) + stat = new_stat + return diff +end; + +upgrade_trace = {}; +errinj = vshard.storage.internal.errinj; + +for _, handler in pairs(handlers) do + table.insert(upgrade_trace, string.format('Upgrade to %s', handler.version)) + + table.insert(upgrade_trace, 'Errinj in begin') + errinj.ERRINJ_UPGRADE = 'begin' + table.insert(upgrade_trace, + {util.check_error(upgrade, handler.version, user, password)}) + table.insert(upgrade_trace, {diff = stat_update()}) + + table.insert(upgrade_trace, 'Errinj in end') + errinj.ERRINJ_UPGRADE = 'end' + table.insert(upgrade_trace, + {util.check_error(upgrade, handler.version, user, password)}) + table.insert(upgrade_trace, {diff = stat_update()}) + + table.insert(upgrade_trace, 'No errinj') + errinj.ERRINJ_UPGRADE = nil + table.insert(upgrade_trace, + {pcall(upgrade, handler.version, user, password)}) + table.insert(upgrade_trace, {diff = stat_update()}) +end; + +_ = test_run:cmd("setopt delimiter ''"); + +upgrade_trace + +_ = test_run:switch("default") +_ = test_run:cmd("stop server newdefault") +_ = test_run:cmd("cleanup server newdefault") +_ = test_run:cmd("delete server newdefault") diff --git a/test/upgrade/upgrade.result b/test/upgrade/upgrade.result index e21d353..4157b08 100644 --- a/test/upgrade/upgrade.result +++ b/test/upgrade/upgrade.result @@ -145,14 +145,11 @@ box.space._schema:get({'vshard_version'}) | ... vshard.storage.internal.schema_current_version() | --- - | - [0, 1, 16, 0] + | - '{0.1.16.0}' | ... vshard.storage.internal.schema_latest_version | --- - | - - 0 - | - 1 - | - 16 - | - 0 + | - '{0.1.16.0}' | ... test_run:switch('storage_1_b') @@ -165,14 +162,11 @@ box.space._schema:get({'vshard_version'}) | ... vshard.storage.internal.schema_current_version() | --- - | - [0, 1, 16, 0] + | - '{0.1.16.0}' | ... vshard.storage.internal.schema_latest_version | --- - | - - 0 - | - 1 - | - 16 - | - 0 + | - '{0.1.16.0}' | ... test_run:switch('default') diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua index 630d1ff..ebc6af1 100644 --- a/vshard/storage/init.lua +++ b/vshard/storage/init.lua @@ -74,6 +74,7 @@ if not M then ERRINJ_LAST_RECEIVE_DELAY = false, ERRINJ_RECEIVE_PARTIALLY = false, ERRINJ_NO_RECOVERY = false, + ERRINJ_UPGRADE = false, }, -- This counter is used to restart background fibers with -- new reloaded code. @@ -258,6 +259,33 @@ end -- Schema -------------------------------------------------------------------------------- +local schema_version_mt = { + __tostring = function(self) + return string.format('{%s}', table.concat(self, '.')) + end, + __serialize = function(self) + return tostring(self) + end, + __eq = function(l, r) + return l[1] == r[1] and l[2] == r[2] and l[3] == r[3] and l[4] == r[4] + end, + __lt = function(l, r) + for i = 1, 4 do + local diff = l[i] - r[i] + if diff < 0 then + return true + elseif diff > 0 then + return false + end + end + return false; + end, +} + +local function schema_version_make(ver) + return setmetatable(ver, schema_version_mt) +end + -- VShard versioning works in 4 numbers: major, minor, patch, and -- a last helper number incremented on every schema change, if -- first 3 numbers stay not changed. That happens when users take @@ -267,7 +295,7 @@ end -- The schema first time appeared with 0.1.16. So this function -- describes schema before that - 0.1.15. local function schema_init_0_1_15_0(username, password) - log.info("Initializing schema") + log.info("Initializing schema %s", schema_version_make({0, 1, 15, 0})) box.schema.user.create(username, { password = password, if_not_exists = true, @@ -309,36 +337,31 @@ local function schema_init_0_1_15_0(username, password) box.space._schema:replace({'vshard_version', 0, 1, 15, 0}) end -local function schema_compare_versions(v1, v2) - for i = 1, 4 do - if v1[i] ~= v2[i] then - return v1[i] - v2[i] - end - end - return 0 -end - local function schema_upgrade_to_0_1_16_0(username) -- Since 0.1.16.0 the old versioning by -- 'oncevshard:storage:<number>' is dropped because it is not -- really extendible nor understandable. - log.info('Insert vshard_version into _schema') - box.begin() + log.info("Insert 'vshard_version' into _schema") box.space._schema:replace({'vshard_version', 0, 1, 16, 0}) box.space._schema:delete({'oncevshard:storage:1'}) - box.commit() +end + +local function schema_downgrade_from_0_1_16_0() + log.info("Remove 'vshard_version' from _schema") + box.space._schema:replace({'oncevshard:storage:1'}) + box.space._schema:delete({'vshard_version'}) end local function schema_current_version() local version = box.space._schema:get({'vshard_version'}) if version == nil then - return {0, 1, 15, 0} + return schema_version_make({0, 1, 15, 0}) else - return version:totable(2) + return schema_version_make(version:totable(2)) end end -local schema_latest_version = {0, 1, 16, 0} +local schema_latest_version = schema_version_make({0, 1, 16, 0}) local function schema_upgrade_replica() local version = schema_current_version() @@ -354,11 +377,10 @@ local function schema_upgrade_replica() -- For example, from 1.2.3.4 to 1.7.8.9, when it is assumed -- that a safe upgrade should go 1.2.3.4 -> 1.2.4.1 -> -- 1.3.1.1 and so on step by step. - if schema_compare_versions(version, schema_latest_version) ~= 0 then + if version ~= schema_latest_version then log.info('Replica\' vshard schema version is not latest - current '.. - '%s vs latest %s, but the replica still can work', - table.concat(version, '.'), - table.concat(schema_latest_version, '.')) + '%s vs latest %s, but the replica still can work', version, + schema_latest_version) end -- In future for hard changes the replica may be suspended -- until its schema is synced with master. Or it may @@ -374,10 +396,14 @@ end -- prohibit yields, in case the upgrade, for example, affects huge -- number of tuples (_bucket records, maybe). local schema_upgrade_handlers = { - {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0}, + { + version = schema_version_make({0, 1, 16, 0}), + upgrade = schema_upgrade_to_0_1_16_0, + downgrade = schema_downgrade_from_0_1_16_0 + }, } -local function schema_upgrade_master(username, password) +local function schema_upgrade_master(target_version, username, password) local _schema = box.space._schema local is_old_versioning = _schema:get({'oncevshard:storage:1'}) ~= nil local version = schema_current_version() @@ -386,39 +412,60 @@ local function schema_upgrade_master(username, password) if is_bootstrap then schema_init_0_1_15_0(username, password) elseif is_old_versioning then - log.info('The instance does not have vshard_version record. '.. - 'It is 0.1.15.0.') - end - local handler_latest_version = - schema_upgrade_handlers[#schema_upgrade_handlers].version - assert(schema_compare_versions(handler_latest_version, - schema_latest_version) == 0) + log.info("The instance does not have 'vshard_version' record. ".. + "It is 0.1.15.0.") + end + assert(schema_upgrade_handlers[#schema_upgrade_handlers].version == + schema_latest_version) + local prev_version = version + local ok, err1, err2 + local errinj = M.errinj.ERRINJ_UPGRADE for _, handler in ipairs(schema_upgrade_handlers) do local next_version = handler.version - if schema_compare_versions(next_version, version) > 0 then - local next_version_str = table.concat(next_version, '.') - log.info("Upgrade vshard to {%s}", next_version_str) - local ok, err = pcall(handler.func, username, password) + if next_version > target_version then + break + end + if next_version > version then + log.info("Upgrade vshard schema to %s", next_version) + if errinj == 'begin' then + ok, err1 = false, 'Errinj in begin' + else + ok, err1 = pcall(handler.upgrade, username) + if ok and errinj == 'end' then + ok, err1 = false, 'Errinj in end' + end + end if not ok then + -- Rollback in case the handler started a + -- transaction before the exception. box.rollback() - log.info("Couldn't upgrade to {%s}: %s", next_version_str, err) - error(err) + log.info("Couldn't upgrade schema to %s: '%s'. Revert to %s", + next_version, err1, prev_version) + ok, err2 = pcall(handler.downgrade) + if not ok then + log.info("Couldn't downgrade schema to %s - fatal error: ".. + "'%s'", prev_version, err2) + os.exit(-1) + end + error(err1) end - log.info("Successful vshard upgrade to {%s}", next_version_str) - ok, err = pcall(_schema.replace, _schema, - {'vshard_version', unpack(next_version)}) + ok, err1 = pcall(_schema.replace, _schema, + {'vshard_version', unpack(next_version)}) if not ok then - log.info("Upgraded to {%s} but couldn't update _schema ".. - "'vshard_version' - fatal error: %s", err) + log.info("Upgraded schema to %s but couldn't update _schema ".. + "'vshard_version' - fatal error: '%s'", next_version, + err1) os.exit(-1) end + log.info("Successful vshard schema upgrade to %s", next_version) end + prev_version = next_version end end local function schema_upgrade(is_master, username, password) if is_master then - return schema_upgrade_master(username, password) + return schema_upgrade_master(schema_latest_version, username, password) else return schema_upgrade_replica() end @@ -2298,12 +2345,8 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload) local uri = luri.parse(this_replica.uri) schema_upgrade(is_master, uri.login, uri.password) - if not is_reload then - box.space._bucket:on_replace(bucket_generation_increment) - else - local old = box.space._bucket:on_replace()[1] - box.space._bucket:on_replace(bucket_generation_increment, old) - end + local old_trigger = box.space._bucket:on_replace()[1] + box.space._bucket:on_replace(bucket_generation_increment, old_trigger) lreplicaset.rebind_replicasets(new_replicasets, M.replicasets) lreplicaset.outdate_replicasets(M.replicasets) @@ -2599,6 +2642,10 @@ M.rlist = { } M.schema_latest_version = schema_latest_version M.schema_current_version = schema_current_version +M.schema_upgrade_master = schema_upgrade_master +M.schema_upgrade_handlers = schema_upgrade_handlers +M.schema_version_make = schema_version_make +M.schema_bootstrap = schema_init_0_1_15_0 return { sync = sync, ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() 2020-03-21 18:59 [Tarantool-patches] [PATCH vshard 0/2] vshard upgrade and _call Vladislav Shpilevoy 2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy Vladislav Shpilevoy @ 2020-03-21 18:59 ` Vladislav Shpilevoy 2020-03-22 5:08 ` Oleg Babin 1 sibling, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-03-21 18:59 UTC (permalink / raw) To: tarantool-patches, olegrok, yaroslav.dynnikov It is a new internal function which is supposed to absorb all the other internal functions to make it easier to change them, to simplify adding new and drop old functions. This also hides them from vshard.storage.* namespace so users can't try to use them not knowing they are private. Additionally, _call() allows to change internal function set even on read-only instances, because it has nothing to do with the schema. Internal function set can be updated by reload. Closes #227 Part of #210 --- _call can't be properly integrated at this moment, because that would break rebalancing from old nodes. The new function is only added. Its usage and drop of old functions will happen in 0.1.17 when I will finish and merge top commit from this branch: https://github.com/tarantool/vshard/tree/gerold103/gh-227-drop-old-functions test/upgrade/upgrade.result | 12 ++++++++++++ test/upgrade/upgrade.test.lua | 3 +++ vshard/storage/init.lua | 26 ++++++++++++++++++++++++++ 3 files changed, 41 insertions(+) diff --git a/test/upgrade/upgrade.result b/test/upgrade/upgrade.result index 0fb68c7..6c9bf2c 100644 --- a/test/upgrade/upgrade.result +++ b/test/upgrade/upgrade.result @@ -92,6 +92,10 @@ vshard.storage.internal.schema_latest_version | --- | - null | ... +vshard.storage._call == nil + | --- + | - true + | ... bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2 | --- | ... @@ -157,6 +161,10 @@ vshard.storage.internal.schema_latest_version | - 16 | - 0 | ... +vshard.storage._call ~= nil + | --- + | - true + | ... test_run:switch('storage_1_b') | --- @@ -180,6 +188,10 @@ vshard.storage.internal.schema_latest_version | - 16 | - 0 | ... +vshard.storage._call ~= nil + | --- + | - true + | ... test_run:switch('default') | --- diff --git a/test/upgrade/upgrade.test.lua b/test/upgrade/upgrade.test.lua index 3a4d113..422920e 100644 --- a/test/upgrade/upgrade.test.lua +++ b/test/upgrade/upgrade.test.lua @@ -31,6 +31,7 @@ test_run:switch('storage_2_a') box.space._schema:get({'oncevshard:storage:1'}) or box.space._schema:select() vshard.storage.internal.schema_current_version vshard.storage.internal.schema_latest_version +vshard.storage._call == nil bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2 first_bucket = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1 vshard.storage.bucket_force_create(first_bucket, bucket_count) @@ -51,11 +52,13 @@ test_run:switch('storage_1_a') box.space._schema:get({'vshard_version'}) vshard.storage.internal.schema_current_version() vshard.storage.internal.schema_latest_version +vshard.storage._call ~= nil test_run:switch('storage_1_b') box.space._schema:get({'vshard_version'}) vshard.storage.internal.schema_current_version() vshard.storage.internal.schema_latest_version +vshard.storage._call ~= nil test_run:switch('default') -- Main purpose of the test - ensure that data can be safely moved diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua index a87a50b..0af075b 100644 --- a/vshard/storage/init.lua +++ b/vshard/storage/init.lua @@ -325,6 +325,20 @@ local function schema_upgrade_to_0_1_16_0(username) log.info('Insert vshard_version into _schema') box.space._schema:replace({'vshard_version', 0, 1, 16, 0}) box.space._schema:delete({'oncevshard:storage:1'}) + + -- vshard.storage._call() is supposed to replace some internal + -- functions exposed in _func; to allow introduction of new + -- functions on replicas; to allow change of internal + -- functions without touching the schema. + local func = 'vshard.storage._call' + log.info('Create function %s()', func) + box.schema.func.create(func, {setuid = true}) + box.schema.user.grant(username, 'execute', 'function', func) + -- Don't drop old functions in the same version. Removal can + -- happen only after 0.1.16. Or there should appear support of + -- rebalancing from too old versions. Drop of these functions + -- now would immediately make it impossible to rebalance from + -- old instances. end local function schema_current_version() @@ -2154,6 +2168,17 @@ local function storage_call(bucket_id, mode, name, args) return ok, ret1, ret2, ret3 end +local service_call_api = { + bucket_recv = bucket_recv, + rebalancer_apply_routes = rebalancer_apply_routes, + rebalancer_request_state = rebalancer_request_state, +} + +local function service_call(...) + local service_name = select(1, ...) + return service_call_api[service_name](select(2, ...)) +end + -------------------------------------------------------------------------------- -- Configuration -------------------------------------------------------------------------------- @@ -2609,6 +2634,7 @@ return { rebalancing_is_in_progress = rebalancing_is_in_progress, recovery_wakeup = recovery_wakeup, call = storage_call, + _call = service_call, cfg = function(cfg, uuid) return storage_cfg(cfg, uuid, false) end, info = storage_info, buckets_info = storage_buckets_info, -- 2.21.1 (Apple Git-122.3) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() 2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() Vladislav Shpilevoy @ 2020-03-22 5:08 ` Oleg Babin 2020-03-22 19:13 ` Vladislav Shpilevoy 0 siblings, 1 reply; 14+ messages in thread From: Oleg Babin @ 2020-03-22 5:08 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov Thanks for your patch! See comments bellow. On 21/03/2020 21:59, Vladislav Shpilevoy wrote: > _call can't be properly integrated at this moment, because that > would break rebalancing from old nodes. The new function is only > added. Its usage and drop of old functions will happen in 0.1.17 > when I will finish and merge top commit from this branch: > https://github.com/tarantool/vshard/tree/gerold103/gh-227-drop-old-functions Could you add some "smoke" tests for "_call" function? The test in this patch only checks that it exists but don't check that it works. If it's hard and non-trivial ignore this comment. > > +local service_call_api = { > + bucket_recv = bucket_recv, > + rebalancer_apply_routes = rebalancer_apply_routes, > + rebalancer_request_state = rebalancer_request_state, > +} > + > +local function service_call(...) > + local service_name = select(1, ...) > + return service_call_api[service_name](select(2, ...)) > +end What's about checking that "service_call_api[service_name]" exists? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() 2020-03-22 5:08 ` Oleg Babin @ 2020-03-22 19:13 ` Vladislav Shpilevoy 2020-03-23 6:42 ` Oleg Babin 0 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-03-22 19:13 UTC (permalink / raw) To: Oleg Babin, tarantool-patches, yaroslav.dynnikov On 22/03/2020 06:08, Oleg Babin wrote: > Thanks for your patch! See comments bellow. > > On 21/03/2020 21:59, Vladislav Shpilevoy wrote: >> _call can't be properly integrated at this moment, because that >> would break rebalancing from old nodes. The new function is only >> added. Its usage and drop of old functions will happen in 0.1.17 >> when I will finish and merge top commit from this branch: >> https://github.com/tarantool/vshard/tree/gerold103/gh-227-drop-old-functions > > Could you add some "smoke" tests for "_call" function? The test in this patch only checks that it exists but don't check that it works. If it's hard and non-trivial ignore this comment. Ok, see my attempt below. It is hard to test each function, because 1) bucket_recv() requires to be called twice, and with non-trivial arguments; 2) rebalancer_apply_routes() is async, it starts a fiber and returns always true, so to test its _call invocation it is necessary to do some manual rebalancing. But since _call functions are anyway used in other places, and are already tested, the only thing left to test is whether all functions are added to _call API, and if the arguments are forwarded correctly. So I added a new function for this: 'test_api'. It returns which functions are present in _call API, and how arguments are forwarded. (And which proves why _call is so useful - can add anything to there, hidden from a user, from _func space, even add some test things.) ==================== diff --git a/test/upgrade/upgrade.result b/test/upgrade/upgrade.result index 104d31a..9e6d810 100644 --- a/test/upgrade/upgrade.result +++ b/test/upgrade/upgrade.result @@ -162,6 +162,16 @@ vshard.storage._call ~= nil | --- | - true | ... +vshard.storage._call('test_api', 1, 2, 3) + | --- + | - bucket_recv: true + | rebalancer_apply_routes: true + | test_api: true + | rebalancer_request_state: true + | - 1 + | - 2 + | - 3 + | ... test_run:switch('storage_1_b') | --- diff --git a/test/upgrade/upgrade.test.lua b/test/upgrade/upgrade.test.lua index 422920e..aea0da8 100644 --- a/test/upgrade/upgrade.test.lua +++ b/test/upgrade/upgrade.test.lua @@ -53,6 +53,7 @@ box.space._schema:get({'vshard_version'}) vshard.storage.internal.schema_current_version() vshard.storage.internal.schema_latest_version vshard.storage._call ~= nil +vshard.storage._call('test_api', 1, 2, 3) test_run:switch('storage_1_b') box.space._schema:get({'vshard_version'}) diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua index 77bc9dd..6aa5d6a 100644 --- a/vshard/storage/init.lua +++ b/vshard/storage/init.lua @@ -2184,11 +2184,24 @@ local function storage_call(bucket_id, mode, name, args) return ok, ret1, ret2, ret3 end -local service_call_api = { +local service_call_api + +local function service_call_test_api(...) + return service_call_api, ... +end + +service_call_api = setmetatable({ bucket_recv = bucket_recv, rebalancer_apply_routes = rebalancer_apply_routes, rebalancer_request_state = rebalancer_request_state, -} + test_api = service_call_test_api, +}, {__serialize = function(api) + local res = {} + for k, _ in pairs(api) do + res[k] = true + end + return res +end}) local function service_call(...) local service_name = select(1, ...) ==================== __serialize is redefined so as not to print function addresses in the test. >> +local service_call_api = { >> + bucket_recv = bucket_recv, >> + rebalancer_apply_routes = rebalancer_apply_routes, >> + rebalancer_request_state = rebalancer_request_state, >> +} >> + >> +local function service_call(...) >> + local service_name = select(1, ...) >> + return service_call_api[service_name](select(2, ...)) >> +end > > What's about checking that "service_call_api[service_name]" exists? Since the API is entirely internal, I decided to avoid here such sanity checks against API misusage. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() 2020-03-22 19:13 ` Vladislav Shpilevoy @ 2020-03-23 6:42 ` Oleg Babin 2020-03-23 22:32 ` Vladislav Shpilevoy 0 siblings, 1 reply; 14+ messages in thread From: Oleg Babin @ 2020-03-23 6:42 UTC (permalink / raw) To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov Thanks for your answers. Seems I've missed one comment. But LGTM in general. On 22/03/2020 22:13, Vladislav Shpilevoy wrote: >>> + >>> +local function service_call(...) >>> + local service_name = select(1, ...) >>> + return service_call_api[service_name](select(2, ...)) >>> +end >> What's about a following diff? ============ diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua index 6aa5d6a..3653914 100644 --- a/vshard/storage/init.lua +++ b/vshard/storage/init.lua @@ -2203,9 +2203,8 @@ service_call_api = setmetatable({ return res end}) -local function service_call(...) - local service_name = select(1, ...) - return service_call_api[service_name](select(2, ...)) +local function service_call(service_name, ...) + return service_call_api[service_name](...) end ============ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() 2020-03-23 6:42 ` Oleg Babin @ 2020-03-23 22:32 ` Vladislav Shpilevoy 0 siblings, 0 replies; 14+ messages in thread From: Vladislav Shpilevoy @ 2020-03-23 22:32 UTC (permalink / raw) To: Oleg Babin, tarantool-patches, yaroslav.dynnikov On 23/03/2020 07:42, Oleg Babin wrote: > On 22/03/2020 22:13, Vladislav Shpilevoy wrote: >>>> + >>>> +local function service_call(...) >>>> + local service_name = select(1, ...) >>>> + return service_call_api[service_name](select(2, ...)) >>>> +end >>> > > What's about a following diff? > ============ > diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua > index 6aa5d6a..3653914 100644 > --- a/vshard/storage/init.lua > +++ b/vshard/storage/init.lua > @@ -2203,9 +2203,8 @@ service_call_api = setmetatable({ > return res > end}) > > -local function service_call(...) > - local service_name = select(1, ...) > - return service_call_api[service_name](select(2, ...)) > +local function service_call(service_name, ...) > + return service_call_api[service_name](...) > end > ============ Indeed, that looks better. I missed that I could declare first argument separately. Applied. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-03-24 23:44 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-21 18:59 [Tarantool-patches] [PATCH vshard 0/2] vshard upgrade and _call Vladislav Shpilevoy 2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy Vladislav Shpilevoy 2020-03-22 5:05 ` Oleg Babin 2020-03-22 19:12 ` Vladislav Shpilevoy 2020-03-23 6:35 ` Oleg Babin 2020-03-23 22:32 ` Vladislav Shpilevoy 2020-03-24 4:32 ` Oleg Babin 2020-03-24 15:21 ` Yaroslav Dynnikov 2020-03-24 23:44 ` Vladislav Shpilevoy 2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() Vladislav Shpilevoy 2020-03-22 5:08 ` Oleg Babin 2020-03-22 19:13 ` Vladislav Shpilevoy 2020-03-23 6:42 ` Oleg Babin 2020-03-23 22:32 ` Vladislav Shpilevoy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox