Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH vshard 0/2] vshard upgrade and _call
@ 2020-03-21 18:59 Vladislav Shpilevoy
  2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy Vladislav Shpilevoy
  2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() Vladislav Shpilevoy
  0 siblings, 2 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-03-21 18:59 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

The patchset introduces vshard.storage._call() API - one access point for all
internal functions. It allows to hide them from the public usage behind one
function, and to add/remove internal functions on read-only replicas using
restart or hot code reload. The new call is going to be firstly used by the new
discovery, which currently suffers from too long tx thread usage when fetches
millions of buckets. The existing buckets_discovery() function can't be changed
because it is public. So a new will be added inside _call().

To make it possible to introduce _call() and remove old functions the patchset
describes and implements upgrade strategy, which was not really well thought
through and developed until now.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-227-service-call
Issue: https://github.com/tarantool/tarantool/issues/227

Vladislav Shpilevoy (2):
  storage: introduce upgrade strategy
  storage: introduce vshard.storage._call()

 test/upgrade/box.lua          |   9 ++
 test/upgrade/storage_1_a.lua  |  11 ++
 test/upgrade/storage_1_b.lua  |   1 +
 test/upgrade/storage_2_a.lua  |   1 +
 test/upgrade/storage_2_b.lua  |   1 +
 test/upgrade/suite.ini        |   7 +
 test/upgrade/upgrade.result   | 271 ++++++++++++++++++++++++++++++++++
 test/upgrade/upgrade.test.lua |  97 ++++++++++++
 vshard/storage/init.lua       | 147 +++++++++++++++++-
 9 files changed, 541 insertions(+), 4 deletions(-)
 create mode 100644 test/upgrade/box.lua
 create mode 100644 test/upgrade/storage_1_a.lua
 create mode 120000 test/upgrade/storage_1_b.lua
 create mode 120000 test/upgrade/storage_2_a.lua
 create mode 120000 test/upgrade/storage_2_b.lua
 create mode 100644 test/upgrade/suite.ini
 create mode 100644 test/upgrade/upgrade.result
 create mode 100644 test/upgrade/upgrade.test.lua

-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy
  2020-03-21 18:59 [Tarantool-patches] [PATCH vshard 0/2] vshard upgrade and _call Vladislav Shpilevoy
@ 2020-03-21 18:59 ` Vladislav Shpilevoy
  2020-03-22  5:05   ` Oleg Babin
  2020-03-24 15:21   ` Yaroslav Dynnikov
  2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() Vladislav Shpilevoy
  1 sibling, 2 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-03-21 18:59 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Since vshard creation it never needed any schema changes. Version
of vshard schema was always '1', stored in _schema as
'oncevshard:storage:1', because it was created by
box.once('vshard:storage:1').

Now it is time to make some changes to the schema. First is
necessity to simplify introduction and removal of internal
functions. There are 3 of them at the moment: bucket_recv()/
rebalancer_apply_routes()/rebalancer_request_state(). Users see
them in vshard.storage.* namespace, and in _func space. They are
going to be replaced by one vshard.storage._service_call(). In
that way it will be easy to update them without schema changes, to
add new functions, remove them. In _service_call() it will be
possible to validate versions, return proper errors in case a
requested function does not exist.

Secondly, _service_call() is going to be used by new discovery on
routers. It is not possible to modify the existing
vshard.storage.buckets_discovery() because it is a pubic function.
But a new discovery can be added to _service_call().

Thirdly, there is a bug in rebalancing related to possible big TCP
delays, which requires a change of _bucket space format in future.

Fourthly, _service_call() would allow to introduce new functions
on read-only replica, using code reload only. This may be needed
for the bug about bucket_ref() not preventing bucket move.

The patch introduces versioning using 4 numbers: x.x.x.x. The
first 3 numbers are major, minor, patch, the same as in the tags.
The last value is a number increased when first 3 numbers can't be
changed, but the schema is modified. That happens when users take
master branch between 2 tags, and then the schema is changed again
before the new tag is published.

Upgrade between two adjacent tags is supposed to be always safe
and automatic when reload or reconfiguration happen. However
currently there are too few versions so as any upgrade could be
not safe.

Needed for #227
Needed for #210
---
 test/upgrade/box.lua          |   9 ++
 test/upgrade/storage_1_a.lua  |  11 ++
 test/upgrade/storage_1_b.lua  |   1 +
 test/upgrade/storage_2_a.lua  |   1 +
 test/upgrade/storage_2_b.lua  |   1 +
 test/upgrade/suite.ini        |   7 +
 test/upgrade/upgrade.result   | 259 ++++++++++++++++++++++++++++++++++
 test/upgrade/upgrade.test.lua |  94 ++++++++++++
 vshard/storage/init.lua       | 121 +++++++++++++++-
 9 files changed, 500 insertions(+), 4 deletions(-)
 create mode 100644 test/upgrade/box.lua
 create mode 100644 test/upgrade/storage_1_a.lua
 create mode 120000 test/upgrade/storage_1_b.lua
 create mode 120000 test/upgrade/storage_2_a.lua
 create mode 120000 test/upgrade/storage_2_b.lua
 create mode 100644 test/upgrade/suite.ini
 create mode 100644 test/upgrade/upgrade.result
 create mode 100644 test/upgrade/upgrade.test.lua

diff --git a/test/upgrade/box.lua b/test/upgrade/box.lua
new file mode 100644
index 0000000..ad0543a
--- /dev/null
+++ b/test/upgrade/box.lua
@@ -0,0 +1,9 @@
+#!/usr/bin/env tarantool
+
+require('strict').on()
+
+box.cfg{
+    listen = os.getenv("LISTEN"),
+}
+
+require('console').listen(os.getenv('ADMIN'))
diff --git a/test/upgrade/storage_1_a.lua b/test/upgrade/storage_1_a.lua
new file mode 100644
index 0000000..a521dbc
--- /dev/null
+++ b/test/upgrade/storage_1_a.lua
@@ -0,0 +1,11 @@
+#!/usr/bin/env tarantool
+util = require('util')
+NAME = require('fio').basename(arg[0], '.lua')
+local source_path = arg[1]
+if source_path then
+    -- Run one storage on a different vshard
+    -- version.
+    package.path = string.format('%s/?.lua;%s/?/init.lua;%s', source_path,
+                                 source_path, package.path)
+end
+require('storage_template')
diff --git a/test/upgrade/storage_1_b.lua b/test/upgrade/storage_1_b.lua
new file mode 120000
index 0000000..02572da
--- /dev/null
+++ b/test/upgrade/storage_1_b.lua
@@ -0,0 +1 @@
+storage_1_a.lua
\ No newline at end of file
diff --git a/test/upgrade/storage_2_a.lua b/test/upgrade/storage_2_a.lua
new file mode 120000
index 0000000..02572da
--- /dev/null
+++ b/test/upgrade/storage_2_a.lua
@@ -0,0 +1 @@
+storage_1_a.lua
\ No newline at end of file
diff --git a/test/upgrade/storage_2_b.lua b/test/upgrade/storage_2_b.lua
new file mode 120000
index 0000000..02572da
--- /dev/null
+++ b/test/upgrade/storage_2_b.lua
@@ -0,0 +1 @@
+storage_1_a.lua
\ No newline at end of file
diff --git a/test/upgrade/suite.ini b/test/upgrade/suite.ini
new file mode 100644
index 0000000..78efefe
--- /dev/null
+++ b/test/upgrade/suite.ini
@@ -0,0 +1,7 @@
+[default]
+core = tarantool
+description = Upgrade tests
+script = box.lua
+is_parallel = False
+lua_libs = ../lua_libs/storage_template.lua ../lua_libs/util.lua
+           ../lua_libs/git_util.lua ../../example/localcfg.lua
diff --git a/test/upgrade/upgrade.result b/test/upgrade/upgrade.result
new file mode 100644
index 0000000..0fb68c7
--- /dev/null
+++ b/test/upgrade/upgrade.result
@@ -0,0 +1,259 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+git_util = require('git_util')
+ | ---
+ | ...
+util = require('util')
+ | ---
+ | ...
+
+-- Commit "Improve compatibility with 1.9".
+version_0_1_15_0 = '79a4dbfc4229e922cbfe4be259193a7b18dc089d'
+ | ---
+ | ...
+vshard_copy_path = util.git_checkout('vshard_git_tree_copy_0_1_15_0',           \
+                                     version_0_1_15_0)
+ | ---
+ | ...
+
+REPLICASET_1 = { 'storage_1_a', 'storage_1_b' }
+ | ---
+ | ...
+REPLICASET_2 = { 'storage_2_a', 'storage_2_b' }
+ | ---
+ | ...
+test_run:create_cluster(REPLICASET_1, 'upgrade', {args = vshard_copy_path})
+ | ---
+ | ...
+test_run:create_cluster(REPLICASET_2, 'upgrade', {args = vshard_copy_path})
+ | ---
+ | ...
+util = require('util')
+ | ---
+ | ...
+util.wait_master(test_run, REPLICASET_1, 'storage_1_a')
+ | ---
+ | ...
+util.wait_master(test_run, REPLICASET_2, 'storage_2_a')
+ | ---
+ | ...
+util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memtx\')')
+ | ---
+ | ...
+
+test_run:switch('storage_1_a')
+ | ---
+ | - true
+ | ...
+box.space._schema:get({'oncevshard:storage:1'}) or box.space._schema:select()
+ | ---
+ | - ['oncevshard:storage:1']
+ | ...
+vshard.storage.internal.schema_current_version
+ | ---
+ | - null
+ | ...
+vshard.storage.internal.schema_latest_version
+ | ---
+ | - null
+ | ...
+bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2
+ | ---
+ | ...
+vshard.storage.bucket_force_create(1, bucket_count)
+ | ---
+ | - true
+ | ...
+box.begin()                                                                     \
+for i = 1, bucket_count do box.space.test:replace{i, i} end                     \
+box.commit()
+ | ---
+ | ...
+box.space.test:count()
+ | ---
+ | - 1500
+ | ...
+
+test_run:switch('storage_2_a')
+ | ---
+ | - true
+ | ...
+box.space._schema:get({'oncevshard:storage:1'}) or box.space._schema:select()
+ | ---
+ | - ['oncevshard:storage:1']
+ | ...
+vshard.storage.internal.schema_current_version
+ | ---
+ | - null
+ | ...
+vshard.storage.internal.schema_latest_version
+ | ---
+ | - null
+ | ...
+bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2
+ | ---
+ | ...
+first_bucket = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1
+ | ---
+ | ...
+vshard.storage.bucket_force_create(first_bucket, bucket_count)
+ | ---
+ | - true
+ | ...
+box.begin()                                                                     \
+for i = first_bucket, first_bucket + bucket_count - 1 do                        \
+    box.space.test:replace{i, i}                                                \
+end                                                                             \
+box.commit()
+ | ---
+ | ...
+box.space.test:count()
+ | ---
+ | - 1500
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server storage_1_a')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server storage_1_a')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server storage_1_b')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server storage_1_b')
+ | ---
+ | - true
+ | ...
+
+test_run:switch('storage_1_a')
+ | ---
+ | - true
+ | ...
+box.space._schema:get({'vshard_version'})
+ | ---
+ | - ['vshard_version', 0, 1, 16, 0]
+ | ...
+vshard.storage.internal.schema_current_version()
+ | ---
+ | - - 0
+ |   - 1
+ |   - 16
+ |   - 0
+ | ...
+vshard.storage.internal.schema_latest_version
+ | ---
+ | - - 0
+ |   - 1
+ |   - 16
+ |   - 0
+ | ...
+
+test_run:switch('storage_1_b')
+ | ---
+ | - true
+ | ...
+box.space._schema:get({'vshard_version'})
+ | ---
+ | - ['vshard_version', 0, 1, 16, 0]
+ | ...
+vshard.storage.internal.schema_current_version()
+ | ---
+ | - - 0
+ |   - 1
+ |   - 16
+ |   - 0
+ | ...
+vshard.storage.internal.schema_latest_version
+ | ---
+ | - - 0
+ |   - 1
+ |   - 16
+ |   - 0
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+-- Main purpose of the test - ensure that data can be safely moved
+-- from an old instance to a newer one. Weight difference makes
+-- rebalancer move the buckets from old storage_2 to new upgraded
+-- storage_1.
+util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, [[                       \
+    cfg.sharding[ util.replicasets[2] ].weight = 1                              \
+    cfg.sharding[ util.replicasets[1] ].weight = 2                              \
+    cfg.rebalancer_max_sending = 5                                              \
+    vshard.storage.cfg(cfg, util.name_to_uuid[NAME])                            \
+]])
+ | ---
+ | ...
+
+test_run:switch('storage_2_a')
+ | ---
+ | - true
+ | ...
+wait_rebalancer_state('The cluster is balanced ok', test_run)
+ | ---
+ | ...
+active_count = 0
+ | ---
+ | ...
+index = box.space._bucket.index.status
+ | ---
+ | ...
+for _, t in index:pairs({vshard.consts.BUCKET.ACTIVE}) do                       \
+    active_count = active_count + 1                                             \
+    assert(box.space.test:get({t.id}) ~= nil)                                   \
+end
+ | ---
+ | ...
+active_count
+ | ---
+ | - 1000
+ | ...
+
+test_run:switch('storage_1_a')
+ | ---
+ | - true
+ | ...
+active_count = 0
+ | ---
+ | ...
+index = box.space._bucket.index.status
+ | ---
+ | ...
+for _, t in index:pairs({vshard.consts.BUCKET.ACTIVE}) do                       \
+    active_count = active_count + 1                                             \
+    assert(box.space.test:get({t.id}) ~= nil)                                   \
+end
+ | ---
+ | ...
+active_count
+ | ---
+ | - 2000
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:drop_cluster(REPLICASET_2)
+ | ---
+ | ...
+test_run:drop_cluster(REPLICASET_1)
+ | ---
+ | ...
+test_run:cmd('clear filter')
+ | ---
+ | - true
+ | ...
diff --git a/test/upgrade/upgrade.test.lua b/test/upgrade/upgrade.test.lua
new file mode 100644
index 0000000..3a4d113
--- /dev/null
+++ b/test/upgrade/upgrade.test.lua
@@ -0,0 +1,94 @@
+test_run = require('test_run').new()
+git_util = require('git_util')
+util = require('util')
+
+-- Commit "Improve compatibility with 1.9".
+version_0_1_15_0 = '79a4dbfc4229e922cbfe4be259193a7b18dc089d'
+vshard_copy_path = util.git_checkout('vshard_git_tree_copy_0_1_15_0',           \
+                                     version_0_1_15_0)
+
+REPLICASET_1 = { 'storage_1_a', 'storage_1_b' }
+REPLICASET_2 = { 'storage_2_a', 'storage_2_b' }
+test_run:create_cluster(REPLICASET_1, 'upgrade', {args = vshard_copy_path})
+test_run:create_cluster(REPLICASET_2, 'upgrade', {args = vshard_copy_path})
+util = require('util')
+util.wait_master(test_run, REPLICASET_1, 'storage_1_a')
+util.wait_master(test_run, REPLICASET_2, 'storage_2_a')
+util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memtx\')')
+
+test_run:switch('storage_1_a')
+box.space._schema:get({'oncevshard:storage:1'}) or box.space._schema:select()
+vshard.storage.internal.schema_current_version
+vshard.storage.internal.schema_latest_version
+bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2
+vshard.storage.bucket_force_create(1, bucket_count)
+box.begin()                                                                     \
+for i = 1, bucket_count do box.space.test:replace{i, i} end                     \
+box.commit()
+box.space.test:count()
+
+test_run:switch('storage_2_a')
+box.space._schema:get({'oncevshard:storage:1'}) or box.space._schema:select()
+vshard.storage.internal.schema_current_version
+vshard.storage.internal.schema_latest_version
+bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2
+first_bucket = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1
+vshard.storage.bucket_force_create(first_bucket, bucket_count)
+box.begin()                                                                     \
+for i = first_bucket, first_bucket + bucket_count - 1 do                        \
+    box.space.test:replace{i, i}                                                \
+end                                                                             \
+box.commit()
+box.space.test:count()
+
+test_run:switch('default')
+test_run:cmd('stop server storage_1_a')
+test_run:cmd('start server storage_1_a')
+test_run:cmd('stop server storage_1_b')
+test_run:cmd('start server storage_1_b')
+
+test_run:switch('storage_1_a')
+box.space._schema:get({'vshard_version'})
+vshard.storage.internal.schema_current_version()
+vshard.storage.internal.schema_latest_version
+
+test_run:switch('storage_1_b')
+box.space._schema:get({'vshard_version'})
+vshard.storage.internal.schema_current_version()
+vshard.storage.internal.schema_latest_version
+
+test_run:switch('default')
+-- Main purpose of the test - ensure that data can be safely moved
+-- from an old instance to a newer one. Weight difference makes
+-- rebalancer move the buckets from old storage_2 to new upgraded
+-- storage_1.
+util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, [[                       \
+    cfg.sharding[ util.replicasets[2] ].weight = 1                              \
+    cfg.sharding[ util.replicasets[1] ].weight = 2                              \
+    cfg.rebalancer_max_sending = 5                                              \
+    vshard.storage.cfg(cfg, util.name_to_uuid[NAME])                            \
+]])
+
+test_run:switch('storage_2_a')
+wait_rebalancer_state('The cluster is balanced ok', test_run)
+active_count = 0
+index = box.space._bucket.index.status
+for _, t in index:pairs({vshard.consts.BUCKET.ACTIVE}) do                       \
+    active_count = active_count + 1                                             \
+    assert(box.space.test:get({t.id}) ~= nil)                                   \
+end
+active_count
+
+test_run:switch('storage_1_a')
+active_count = 0
+index = box.space._bucket.index.status
+for _, t in index:pairs({vshard.consts.BUCKET.ACTIVE}) do                       \
+    active_count = active_count + 1                                             \
+    assert(box.space.test:get({t.id}) ~= nil)                                   \
+end
+active_count
+
+test_run:switch('default')
+test_run:drop_cluster(REPLICASET_2)
+test_run:drop_cluster(REPLICASET_1)
+test_run:cmd('clear filter')
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index a86fcaf..a87a50b 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -257,7 +257,16 @@ end
 --------------------------------------------------------------------------------
 -- Schema
 --------------------------------------------------------------------------------
-local function storage_schema_v1(username, password)
+
+-- VShard versioning works in 4 numbers: major, minor, patch, and
+-- a last helper number incremented on every schema change, if
+-- first 3 numbers stay not changed. That happens when users take
+-- the latest master version not having a tag yet. They couldn't
+-- upgrade if not the 4th number changed inside one tag.
+
+-- The schema first time appeared with 0.1.16. So this function
+-- describes schema before that - 0.1.15.
+local function schema_init_0_1_15_0(username, password)
     log.info("Initializing schema")
     box.schema.user.create(username, {
         password = password,
@@ -297,7 +306,105 @@ local function storage_schema_v1(username, password)
         box.schema.user.grant(username, 'execute', 'function', name)
     end
 
-    box.snapshot()
+    box.space._schema:replace({'vshard_version', 0, 1, 15, 0})
+end
+
+local function schema_compare_versions(v1, v2)
+    for i = 1, 4 do
+        if v1[i] ~= v2[i] then
+            return v1[i] - v2[i]
+        end
+    end
+    return 0
+end
+
+local function schema_upgrade_to_0_1_16_0(username)
+    -- Since 0.1.16.0 the old versioning by
+    -- 'oncevshard:storage:<number>' is dropped because it is not
+    -- really extendible nor understandable.
+    log.info('Insert vshard_version into _schema')
+    box.space._schema:replace({'vshard_version', 0, 1, 16, 0})
+    box.space._schema:delete({'oncevshard:storage:1'})
+end
+
+local function schema_current_version()
+    local version = box.space._schema:get({'vshard_version'})
+    if version == nil then
+        version = {0, 1, 15, 0}
+    else
+        version = version:totable()
+        version = {version[2], version[3], version[4], version[5]}
+    end
+    return version
+end
+
+local schema_latest_version = {0, 1, 16, 0}
+
+local function schema_upgrade_replica()
+    local version = schema_current_version()
+    -- Replica can't do upgrade - it is read-only. And it
+    -- shouldn't anyway - that would conflict with master doing
+    -- the same. So the upgrade is either non-critical, and the
+    -- replica can work with the new code but old schema. Or it
+    -- it is critical, and need to wait the schema upgrade from
+    -- the master.
+    -- Or it may happen, that the upgrade just is not possible.
+    -- For example, when an auto-upgrade tries to change a too old
+    -- schema to the newest, skipping some intermediate versions.
+    -- For example, from 1.2.3.4 to 1.7.8.9, when it is assumed
+    -- that a safe upgrade should go 1.2.3.4 -> 1.2.4.1 ->
+    -- 1.3.1.1 and so on step by step.
+    if schema_compare_versions(version, schema_latest_version) == 0 then
+        log.info('Replica\'s vshard schema version is up to date - %s',
+                 table.concat(version, '.'))
+    else
+        log.info('Replica\' vshard schema version is outdated - current '..
+                 '%s vs latest %s, but the replica still can work',
+                 table.concat(version, '.'),
+                 table.concat(schema_latest_version, '.'))
+    end
+    -- In future for hard changes the replica may be suspended
+    -- until its schema is synced with master. Or it may
+    -- reject to upgrade in case of incompatible changes. Now
+    -- there are too few versions so as such problems could
+    -- appear.
+end
+
+local function schema_upgrade_master(username, password)
+    local _schema = box.space._schema
+    local is_old_versioning = _schema:get({'oncevshard:storage:1'}) ~= nil
+    local version = schema_current_version()
+    local is_bootstrap = not box.space._bucket
+
+    if is_bootstrap then
+        schema_init_0_1_15_0(username, password)
+    elseif is_old_versioning then
+        log.info('The instance does not have vshard_version record. '..
+                 'It is 0.1.15.0.')
+    end
+    local handlers = {
+        {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0},
+    }
+    assert(schema_compare_versions(handlers[#handlers].version,
+                                   schema_latest_version) == 0)
+    for _, handler in ipairs(handlers) do
+        local next_version = handler.version
+        if schema_compare_versions(next_version, version) > 0 then
+            local next_version_str = table.concat(next_version, '.')
+            log.info("Upgrade vshard to {%s}", next_version_str)
+            handler.func(username, password)
+            log.info("Successful vshard upgrade to {%s}", next_version_str)
+            _schema:replace({'vshard_version', unpack(next_version)})
+        end
+    end
+end
+
+local function schema_upgrade(is_master, username, password)
+    if is_master then
+        return schema_upgrade_master(username, password)
+    else
+        return schema_upgrade_replica()
+    end
 end
 
 local function this_is_master()
@@ -2169,8 +2276,12 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload)
             error(err)
         end
         log.info("Box has been configured")
-        local uri = luri.parse(this_replica.uri)
-        box.once("vshard:storage:1", storage_schema_v1, uri.login, uri.password)
+    end
+
+    local uri = luri.parse(this_replica.uri)
+    schema_upgrade(is_master, uri.login, uri.password)
+
+    if not is_reload then
         box.space._bucket:on_replace(bucket_generation_increment)
     else
         local old = box.space._bucket:on_replace()[1]
@@ -2469,6 +2580,8 @@ M.rlist = {
     add_tail = rlist_add_tail,
     remove = rlist_remove,
 }
+M.schema_latest_version = schema_latest_version
+M.schema_current_version = schema_current_version
 
 return {
     sync = sync,
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call()
  2020-03-21 18:59 [Tarantool-patches] [PATCH vshard 0/2] vshard upgrade and _call Vladislav Shpilevoy
  2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy Vladislav Shpilevoy
@ 2020-03-21 18:59 ` Vladislav Shpilevoy
  2020-03-22  5:08   ` Oleg Babin
  1 sibling, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-03-21 18:59 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

It is a new internal function which is supposed to absorb all the
other internal functions to make it easier to change them, to
simplify adding new and drop old functions.

This also hides them from vshard.storage.* namespace so users
can't try to use them not knowing they are private.

Additionally, _call() allows to change internal function set even
on read-only instances, because it has nothing to do with the
schema. Internal function set can be updated by reload.

Closes #227
Part of #210
---

_call can't be properly integrated at this moment, because that
would break rebalancing from old nodes. The new function is only
added. Its usage and drop of old functions will happen in 0.1.17
when I will finish and merge top commit from this branch:
https://github.com/tarantool/vshard/tree/gerold103/gh-227-drop-old-functions

 test/upgrade/upgrade.result   | 12 ++++++++++++
 test/upgrade/upgrade.test.lua |  3 +++
 vshard/storage/init.lua       | 26 ++++++++++++++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/test/upgrade/upgrade.result b/test/upgrade/upgrade.result
index 0fb68c7..6c9bf2c 100644
--- a/test/upgrade/upgrade.result
+++ b/test/upgrade/upgrade.result
@@ -92,6 +92,10 @@ vshard.storage.internal.schema_latest_version
  | ---
  | - null
  | ...
+vshard.storage._call == nil
+ | ---
+ | - true
+ | ...
 bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2
  | ---
  | ...
@@ -157,6 +161,10 @@ vshard.storage.internal.schema_latest_version
  |   - 16
  |   - 0
  | ...
+vshard.storage._call ~= nil
+ | ---
+ | - true
+ | ...
 
 test_run:switch('storage_1_b')
  | ---
@@ -180,6 +188,10 @@ vshard.storage.internal.schema_latest_version
  |   - 16
  |   - 0
  | ...
+vshard.storage._call ~= nil
+ | ---
+ | - true
+ | ...
 
 test_run:switch('default')
  | ---
diff --git a/test/upgrade/upgrade.test.lua b/test/upgrade/upgrade.test.lua
index 3a4d113..422920e 100644
--- a/test/upgrade/upgrade.test.lua
+++ b/test/upgrade/upgrade.test.lua
@@ -31,6 +31,7 @@ test_run:switch('storage_2_a')
 box.space._schema:get({'oncevshard:storage:1'}) or box.space._schema:select()
 vshard.storage.internal.schema_current_version
 vshard.storage.internal.schema_latest_version
+vshard.storage._call == nil
 bucket_count = vshard.consts.DEFAULT_BUCKET_COUNT / 2
 first_bucket = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1
 vshard.storage.bucket_force_create(first_bucket, bucket_count)
@@ -51,11 +52,13 @@ test_run:switch('storage_1_a')
 box.space._schema:get({'vshard_version'})
 vshard.storage.internal.schema_current_version()
 vshard.storage.internal.schema_latest_version
+vshard.storage._call ~= nil
 
 test_run:switch('storage_1_b')
 box.space._schema:get({'vshard_version'})
 vshard.storage.internal.schema_current_version()
 vshard.storage.internal.schema_latest_version
+vshard.storage._call ~= nil
 
 test_run:switch('default')
 -- Main purpose of the test - ensure that data can be safely moved
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index a87a50b..0af075b 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -325,6 +325,20 @@ local function schema_upgrade_to_0_1_16_0(username)
     log.info('Insert vshard_version into _schema')
     box.space._schema:replace({'vshard_version', 0, 1, 16, 0})
     box.space._schema:delete({'oncevshard:storage:1'})
+
+    -- vshard.storage._call() is supposed to replace some internal
+    -- functions exposed in _func; to allow introduction of new
+    -- functions on replicas; to allow change of internal
+    -- functions without touching the schema.
+    local func = 'vshard.storage._call'
+    log.info('Create function %s()', func)
+    box.schema.func.create(func, {setuid = true})
+    box.schema.user.grant(username, 'execute', 'function', func)
+    -- Don't drop old functions in the same version. Removal can
+    -- happen only after 0.1.16. Or there should appear support of
+    -- rebalancing from too old versions. Drop of these functions
+    -- now would immediately make it impossible to rebalance from
+    -- old instances.
 end
 
 local function schema_current_version()
@@ -2154,6 +2168,17 @@ local function storage_call(bucket_id, mode, name, args)
     return ok, ret1, ret2, ret3
 end
 
+local service_call_api = {
+    bucket_recv = bucket_recv,
+    rebalancer_apply_routes = rebalancer_apply_routes,
+    rebalancer_request_state = rebalancer_request_state,
+}
+
+local function service_call(...)
+    local service_name = select(1, ...)
+    return service_call_api[service_name](select(2, ...))
+end
+
 --------------------------------------------------------------------------------
 -- Configuration
 --------------------------------------------------------------------------------
@@ -2609,6 +2634,7 @@ return {
     rebalancing_is_in_progress = rebalancing_is_in_progress,
     recovery_wakeup = recovery_wakeup,
     call = storage_call,
+    _call = service_call,
     cfg = function(cfg, uuid) return storage_cfg(cfg, uuid, false) end,
     info = storage_info,
     buckets_info = storage_buckets_info,
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy
  2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy Vladislav Shpilevoy
@ 2020-03-22  5:05   ` Oleg Babin
  2020-03-22 19:12     ` Vladislav Shpilevoy
  2020-03-24 15:21   ` Yaroslav Dynnikov
  1 sibling, 1 reply; 14+ messages in thread
From: Oleg Babin @ 2020-03-22  5:05 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Hi! Thanks for your patch!

I have several quite minor comments.

On 21/03/2020 21:59, Vladislav Shpilevoy wrote:

> +
> +local function schema_upgrade_to_0_1_16_0(username)
> +    -- Since 0.1.16.0 the old versioning by
> +    -- 'oncevshard:storage:<number>' is dropped because it is not
> +    -- really extendible nor understandable.
> +    log.info('Insert vshard_version into _schema')
> +    box.space._schema:replace({'vshard_version', 0, 1, 16, 0})
> +    box.space._schema:delete({'oncevshard:storage:1'})
> +end

I think it's better to wrap all upgrades into transaction or may be the 
whole upgrade procedure below.

> +local function schema_current_version()
> +    local version = box.space._schema:get({'vshard_version'})
> +    if version == nil then
> +        version = {0, 1, 15, 0}
> +    else
> +        version = version:totable()
> +        version = {version[2], version[3], version[4], version[5]}

You could use {version:unpack(2)} here. However feel free to ignore this 
comment.

> +    end
> +    return version
> +end
> +
> +local schema_latest_version = {0, 1, 16, 0}
> +
> +local function schema_upgrade_replica()
> +    local version = schema_current_version()
> +    -- Replica can't do upgrade - it is read-only. And it
> +    -- shouldn't anyway - that would conflict with master doing
> +    -- the same. So the upgrade is either non-critical, and the
> +    -- replica can work with the new code but old schema. Or it
> +    -- it is critical, and need to wait the schema upgrade from
> +    -- the master.
> +    -- Or it may happen, that the upgrade just is not possible.
> +    -- For example, when an auto-upgrade tries to change a too old
> +    -- schema to the newest, skipping some intermediate versions.
> +    -- For example, from 1.2.3.4 to 1.7.8.9, when it is assumed
> +    -- that a safe upgrade should go 1.2.3.4 -> 1.2.4.1 ->
> +    -- 1.3.1.1 and so on step by step.
> +    if schema_compare_versions(version, schema_latest_version) == 0 then
> +        log.info('Replica\'s vshard schema version is up to date - %s',
> +                 table.concat(version, '.'))

You don't have such messages on a master (that current schema is the 
latest). I think the log level here should be "verbose".

> +    else
> +        log.info('Replica\' vshard schema version is outdated - current '..
> +                 '%s vs latest %s, but the replica still can work',
> +                 table.concat(version, '.'),
> +                 table.concat(schema_latest_version, '.'))
> +    end
> +    -- In future for hard changes the replica may be suspended
> +    -- until its schema is synced with master. Or it may
> +    -- reject to upgrade in case of incompatible changes. Now
> +    -- there are too few versions so as such problems could
> +    -- appear.
> +end
> +
> +local function schema_upgrade_master(username, password)
> +    local _schema = box.space._schema
> +    local is_old_versioning = _schema:get({'oncevshard:storage:1'}) ~= nil
> +    local version = schema_current_version()
> +    local is_bootstrap = not box.space._bucket
> +
> +    if is_bootstrap then
> +        schema_init_0_1_15_0(username, password)
> +    elseif is_old_versioning then
> +        log.info('The instance does not have vshard_version record. '..
> +                 'It is 0.1.15.0.')
> +    end
> +    local handlers = {
> +        {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0},
> +    }
> +    assert(schema_compare_versions(handlers[#handlers].version,
> +                                   schema_latest_version) == 0)
> +    for _, handler in ipairs(handlers) do
> +        local next_version = handler.version
> +        if schema_compare_versions(next_version, version) > 0 then
> +            local next_version_str = table.concat(next_version, '.')
> +            log.info("Upgrade vshard to {%s}", next_version_str)
> +            handler.func(username, password)

I think it will be good if you wrap it into pcall and add possible error 
handling. I see that all changes quite trivial but there is no guarantee 
that it will be in future. Ideally, if you rollback all already applied 
changes. This is an opportunity for user to downgrade module to previous 
version and report about a bug without any inconsistencies after upgrade.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call()
  2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() Vladislav Shpilevoy
@ 2020-03-22  5:08   ` Oleg Babin
  2020-03-22 19:13     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Babin @ 2020-03-22  5:08 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch! See comments bellow.

On 21/03/2020 21:59, Vladislav Shpilevoy wrote:
> _call can't be properly integrated at this moment, because that
> would break rebalancing from old nodes. The new function is only
> added. Its usage and drop of old functions will happen in 0.1.17
> when I will finish and merge top commit from this branch:
> https://github.com/tarantool/vshard/tree/gerold103/gh-227-drop-old-functions

Could you add some "smoke" tests for "_call" function? The test in this 
patch only checks that it exists but don't check that it works. If it's 
hard and non-trivial ignore this comment.

>   
> +local service_call_api = {
> +    bucket_recv = bucket_recv,
> +    rebalancer_apply_routes = rebalancer_apply_routes,
> +    rebalancer_request_state = rebalancer_request_state,
> +}
> +
> +local function service_call(...)
> +    local service_name = select(1, ...)
> +    return service_call_api[service_name](select(2, ...))
> +end

What's about checking that "service_call_api[service_name]" exists?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy
  2020-03-22  5:05   ` Oleg Babin
@ 2020-03-22 19:12     ` Vladislav Shpilevoy
  2020-03-23  6:35       ` Oleg Babin
  0 siblings, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-03-22 19:12 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

Hi! Thanks for the review!

> On 21/03/2020 21:59, Vladislav Shpilevoy wrote:
> 
>> +
>> +local function schema_upgrade_to_0_1_16_0(username)
>> +    -- Since 0.1.16.0 the old versioning by
>> +    -- 'oncevshard:storage:<number>' is dropped because it is not
>> +    -- really extendible nor understandable.
>> +    log.info('Insert vshard_version into _schema')
>> +    box.space._schema:replace({'vshard_version', 0, 1, 16, 0})
>> +    box.space._schema:delete({'oncevshard:storage:1'})
>> +end
> 
> I think it's better to wrap all upgrades into transaction or may be the whole upgrade procedure below.
Ok, see my comment in the end of the email.

>> +local function schema_current_version()
>> +    local version = box.space._schema:get({'vshard_version'})
>> +    if version == nil then
>> +        version = {0, 1, 15, 0}
>> +    else
>> +        version = version:totable()
>> +        version = {version[2], version[3], version[4], version[5]}
> 
> You could use {version:unpack(2)} here. However feel free to ignore this comment.

Cool, didn't know unpack/totable has parameters.

====================
 local function schema_current_version()
     local version = box.space._schema:get({'vshard_version'})
     if version == nil then
-        version = {0, 1, 15, 0}
+        return {0, 1, 15, 0}
     else
-        version = version:totable()
-        version = {version[2], version[3], version[4], version[5]}
+        return version:totable(2)
     end
-    return version
 end
====================

>> +    end
>> +    return version
>> +end
>> +
>> +local schema_latest_version = {0, 1, 16, 0}
>> +
>> +local function schema_upgrade_replica()
>> +    local version = schema_current_version()
>> +    -- Replica can't do upgrade - it is read-only. And it
>> +    -- shouldn't anyway - that would conflict with master doing
>> +    -- the same. So the upgrade is either non-critical, and the
>> +    -- replica can work with the new code but old schema. Or it
>> +    -- it is critical, and need to wait the schema upgrade from
>> +    -- the master.
>> +    -- Or it may happen, that the upgrade just is not possible.
>> +    -- For example, when an auto-upgrade tries to change a too old
>> +    -- schema to the newest, skipping some intermediate versions.
>> +    -- For example, from 1.2.3.4 to 1.7.8.9, when it is assumed
>> +    -- that a safe upgrade should go 1.2.3.4 -> 1.2.4.1 ->
>> +    -- 1.3.1.1 and so on step by step.
>> +    if schema_compare_versions(version, schema_latest_version) == 0 then
>> +        log.info('Replica\'s vshard schema version is up to date - %s',
>> +                 table.concat(version, '.'))
> 
> You don't have such messages on a master (that current schema is the latest). I think the log level here should be "verbose".

I better drop this message. Not to spam into the log.

>> +    else
>> +        log.info('Replica\' vshard schema version is outdated - current '..
>> +                 '%s vs latest %s, but the replica still can work',
>> +                 table.concat(version, '.'),
>> +                 table.concat(schema_latest_version, '.'))
>> +    end
>> +    -- In future for hard changes the replica may be suspended
>> +    -- until its schema is synced with master. Or it may
>> +    -- reject to upgrade in case of incompatible changes. Now
>> +    -- there are too few versions so as such problems could
>> +    -- appear.
>> +end
>> +
>> +local function schema_upgrade_master(username, password)
>> +    local _schema = box.space._schema
>> +    local is_old_versioning = _schema:get({'oncevshard:storage:1'}) ~= nil
>> +    local version = schema_current_version()
>> +    local is_bootstrap = not box.space._bucket
>> +
>> +    if is_bootstrap then
>> +        schema_init_0_1_15_0(username, password)
>> +    elseif is_old_versioning then
>> +        log.info('The instance does not have vshard_version record. '..
>> +                 'It is 0.1.15.0.')
>> +    end
>> +    local handlers = {
>> +        {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0},
>> +    }
>> +    assert(schema_compare_versions(handlers[#handlers].version,
>> +                                   schema_latest_version) == 0)
>> +    for _, handler in ipairs(handlers) do
>> +        local next_version = handler.version
>> +        if schema_compare_versions(next_version, version) > 0 then
>> +            local next_version_str = table.concat(next_version, '.')
>> +            log.info("Upgrade vshard to {%s}", next_version_str)
>> +            handler.func(username, password)
> 
> I think it will be good if you wrap it into pcall and add possible error handling. I see that all changes quite trivial but there is no guarantee that it will be in future. Ideally, if you rollback all already applied changes. This is an opportunity for user to downgrade module to previous version and report about a bug without any inconsistencies after upgrade.

I tried. See diff below.

Absolutely atomic non-yield upgrade is not really possible. Some
future upgrade may require change of _bucket involving thousands
of tuples, or both DDL + DML change.

But for 0.1.16.0 the atomicity is possible, I added
box.begin/commit/rollback.

Also as you can see, I left begin/commit inside upgrade handler, so
each version upgrade may decide in its handler when start a
transaction, how many tuples to commit at once, how to rollback in
case the upgrade consists of several transactions.

Besides, upgrade is atomic only in scope of one version. If you upgrade
from ver1 to ver2, and then fail to upgrade from ver2 to ver3, only
ver3 is rolled back. Otherwise it would be too complicated to add
'downgrade' handler for each version, and support it forever. With current
way you can reload/restart with ver2's code, fix problems, and try ver3
again.

However this is hard to tell, since formally we have now only ver1 and ver2.

Besides,
1) I moved handler array to a global variable so as not to create the array
on each storage.cfg();
2) I added os.exit() in case upgrade succeeded, but version bump in _schema
failed. Because downgrade can't be done already, and on the other hand it is
not acceptable to work with the newer schema, but an older version. I didn't
trust version bump to handlers since it will be forgotten someday for sure.
(Bump happens in 0.1.16.0 handler, but only because it is a part of schema
change).

====================
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index a20bb21..c0e768a 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -323,8 +323,10 @@ local function schema_upgrade_to_0_1_16_0(username)
     -- 'oncevshard:storage:<number>' is dropped because it is not
     -- really extendible nor understandable.
     log.info('Insert vshard_version into _schema')
+    box.begin()
     box.space._schema:replace({'vshard_version', 0, 1, 16, 0})
     box.space._schema:delete({'oncevshard:storage:1'})
+    box.commit()
 end
 
 local function schema_current_version()
@@ -365,6 +367,16 @@ local function schema_upgrade_replica()
     -- appear.
 end
 
+-- Every handler should be atomic. It is either applied whole, or
+-- not applied at all. Atomic upgrade helps to downgrade in case
+-- something goes wrong. At least by doing restart with the latest
+-- successfully applied version. However, atomicity does not
+-- prohibit yields, in case the upgrade, for example, affects huge
+-- number of tuples (_bucket records, maybe).
+local schema_upgrade_handlers = {
+    {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0},
+}
+
 local function schema_upgrade_master(username, password)
     local _schema = box.space._schema
     local is_old_versioning = _schema:get({'oncevshard:storage:1'}) ~= nil
@@ -377,19 +389,28 @@ local function schema_upgrade_master(username, password)
         log.info('The instance does not have vshard_version record. '..
                  'It is 0.1.15.0.')
     end
-    local handlers = {
-        {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0},
-    }
-    assert(schema_compare_versions(handlers[#handlers].version,
+    local handler_latest_version =
+        schema_upgrade_handlers[#schema_upgrade_handlers].version
+    assert(schema_compare_versions(handler_latest_version,
                                    schema_latest_version) == 0)
-    for _, handler in ipairs(handlers) do
+    for _, handler in ipairs(schema_upgrade_handlers) do
         local next_version = handler.version
         if schema_compare_versions(next_version, version) > 0 then
             local next_version_str = table.concat(next_version, '.')
             log.info("Upgrade vshard to {%s}", next_version_str)
-            handler.func(username, password)
+            local ok, err = pcall(handler.func, username, password)
+            if not ok then
+                log.info("Couldn't upgrade to {%s}: %s", next_version_str, err)
+                error(err)
+            end
             log.info("Successful vshard upgrade to {%s}", next_version_str)
-            _schema:replace({'vshard_version', unpack(next_version)})
+            ok, err = pcall(_schema.replace, _schema,
+                            {'vshard_version', unpack(next_version)})
+            if not ok then
+                log.info("Upgraded to {%s} but couldn't update _schema "..
+                         "'vshard_version' - fatal error: %s", err)
+                os.exit(-1)
+            end
         end
     end
 end
====================

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call()
  2020-03-22  5:08   ` Oleg Babin
@ 2020-03-22 19:13     ` Vladislav Shpilevoy
  2020-03-23  6:42       ` Oleg Babin
  0 siblings, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-03-22 19:13 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov



On 22/03/2020 06:08, Oleg Babin wrote:
> Thanks for your patch! See comments bellow.
> 
> On 21/03/2020 21:59, Vladislav Shpilevoy wrote:
>> _call can't be properly integrated at this moment, because that
>> would break rebalancing from old nodes. The new function is only
>> added. Its usage and drop of old functions will happen in 0.1.17
>> when I will finish and merge top commit from this branch:
>> https://github.com/tarantool/vshard/tree/gerold103/gh-227-drop-old-functions
> 
> Could you add some "smoke" tests for "_call" function? The test in this patch only checks that it exists but don't check that it works. If it's hard and non-trivial ignore this comment.

Ok, see my attempt below. It is hard to test each function, because

    1) bucket_recv() requires to be called twice, and with non-trivial
    arguments;

    2) rebalancer_apply_routes() is async, it starts a fiber and returns
    always true, so to test its _call invocation it is necessary to do
    some manual rebalancing.

But since _call functions are anyway used in other places, and are
already tested, the only thing left to test is whether all functions
are added to _call API, and if the arguments are forwarded correctly.

So I added a new function for this: 'test_api'. It returns which
functions are present in _call API, and how arguments are forwarded.
(And which proves why _call is so useful - can add anything to there,
hidden from a user, from _func space, even add some test things.)

====================
diff --git a/test/upgrade/upgrade.result b/test/upgrade/upgrade.result
index 104d31a..9e6d810 100644
--- a/test/upgrade/upgrade.result
+++ b/test/upgrade/upgrade.result
@@ -162,6 +162,16 @@ vshard.storage._call ~= nil
  | ---
  | - true
  | ...
+vshard.storage._call('test_api', 1, 2, 3)
+ | ---
+ | - bucket_recv: true
+ |   rebalancer_apply_routes: true
+ |   test_api: true
+ |   rebalancer_request_state: true
+ | - 1
+ | - 2
+ | - 3
+ | ...
 
 test_run:switch('storage_1_b')
  | ---
diff --git a/test/upgrade/upgrade.test.lua b/test/upgrade/upgrade.test.lua
index 422920e..aea0da8 100644
--- a/test/upgrade/upgrade.test.lua
+++ b/test/upgrade/upgrade.test.lua
@@ -53,6 +53,7 @@ box.space._schema:get({'vshard_version'})
 vshard.storage.internal.schema_current_version()
 vshard.storage.internal.schema_latest_version
 vshard.storage._call ~= nil
+vshard.storage._call('test_api', 1, 2, 3)
 
 test_run:switch('storage_1_b')
 box.space._schema:get({'vshard_version'})
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 77bc9dd..6aa5d6a 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -2184,11 +2184,24 @@ local function storage_call(bucket_id, mode, name, args)
     return ok, ret1, ret2, ret3
 end
 
-local service_call_api = {
+local service_call_api
+
+local function service_call_test_api(...)
+    return service_call_api, ...
+end
+
+service_call_api = setmetatable({
     bucket_recv = bucket_recv,
     rebalancer_apply_routes = rebalancer_apply_routes,
     rebalancer_request_state = rebalancer_request_state,
-}
+    test_api = service_call_test_api,
+}, {__serialize = function(api)
+    local res = {}
+    for k, _ in pairs(api) do
+        res[k] = true
+    end
+    return res
+end})
 
 local function service_call(...)
     local service_name = select(1, ...)
====================

__serialize is redefined so as not to print function
addresses in the test.

>>   +local service_call_api = {
>> +    bucket_recv = bucket_recv,
>> +    rebalancer_apply_routes = rebalancer_apply_routes,
>> +    rebalancer_request_state = rebalancer_request_state,
>> +}
>> +
>> +local function service_call(...)
>> +    local service_name = select(1, ...)
>> +    return service_call_api[service_name](select(2, ...))
>> +end
> 
> What's about checking that "service_call_api[service_name]" exists?

Since the API is entirely internal, I decided to avoid here
such sanity checks against API misusage.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy
  2020-03-22 19:12     ` Vladislav Shpilevoy
@ 2020-03-23  6:35       ` Oleg Babin
  2020-03-23 22:32         ` Vladislav Shpilevoy
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Babin @ 2020-03-23  6:35 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Hi! Thanks for the changes! One nit and LGTM.

On 22/03/2020 22:12, Vladislav Shpilevoy wrote:
> But for 0.1.16.0 the atomicity is possible, I added
> box.begin/commit/rollback.
> 
> -    for _, handler in ipairs(handlers) do
> +    for _, handler in ipairs(schema_upgrade_handlers) do
>           local next_version = handler.version
>           if schema_compare_versions(next_version, version) > 0 then
>               local next_version_str = table.concat(next_version, '.')
>               log.info("Upgrade vshard to {%s}", next_version_str)
> -            handler.func(username, password)
> +            local ok, err = pcall(handler.func, username, password)
> +            if not ok then
> +                log.info("Couldn't upgrade to {%s}: %s", next_version_str, err)

Seems that you've missed box.rollback here. And I think that it should 
be something like "if box.is_in_txn() then box.rollback() end"

I'm not sure but is it possible to drop last xlogs for user and start to 
use previous version of schema (and vshard version as well) as 
workaround in case of "fatal error"?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call()
  2020-03-22 19:13     ` Vladislav Shpilevoy
@ 2020-03-23  6:42       ` Oleg Babin
  2020-03-23 22:32         ` Vladislav Shpilevoy
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Babin @ 2020-03-23  6:42 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your answers. Seems I've missed one comment. But LGTM in general.

On 22/03/2020 22:13, Vladislav Shpilevoy wrote:
>>> +
>>> +local function service_call(...)
>>> +    local service_name = select(1, ...)
>>> +    return service_call_api[service_name](select(2, ...))
>>> +end
>>

What's about a following diff?
============
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 6aa5d6a..3653914 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -2203,9 +2203,8 @@ service_call_api = setmetatable({
      return res
  end})

-local function service_call(...)
-    local service_name = select(1, ...)
-    return service_call_api[service_name](select(2, ...))
+local function service_call(service_name, ...)
+    return service_call_api[service_name](...)
  end
============

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy
  2020-03-23  6:35       ` Oleg Babin
@ 2020-03-23 22:32         ` Vladislav Shpilevoy
  2020-03-24  4:32           ` Oleg Babin
  0 siblings, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-03-23 22:32 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

Hi! Thanks for the review!

On 23/03/2020 07:35, Oleg Babin wrote:
> On 22/03/2020 22:12, Vladislav Shpilevoy wrote:
>> But for 0.1.16.0 the atomicity is possible, I added
>> box.begin/commit/rollback.
>>
>> -    for _, handler in ipairs(handlers) do
>> +    for _, handler in ipairs(schema_upgrade_handlers) do
>>           local next_version = handler.version
>>           if schema_compare_versions(next_version, version) > 0 then
>>               local next_version_str = table.concat(next_version, '.')
>>               log.info("Upgrade vshard to {%s}", next_version_str)
>> -            handler.func(username, password)
>> +            local ok, err = pcall(handler.func, username, password)
>> +            if not ok then
>> +                log.info("Couldn't upgrade to {%s}: %s", next_version_str, err)
> 
> Seems that you've missed box.rollback here. And I think that it should be something like "if box.is_in_txn() then box.rollback() end"

Yes, you are right.

box.is_in_txn() is not needed here. box.rollback() is no-op in case
there is no a transaction.

====================
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index c0e768a..630d1ff 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -400,6 +400,7 @@ local function schema_upgrade_master(username, password)
             log.info("Upgrade vshard to {%s}", next_version_str)
             local ok, err = pcall(handler.func, username, password)
             if not ok then
+                box.rollback()
                 log.info("Couldn't upgrade to {%s}: %s", next_version_str, err)
                 error(err)
             end
====================

> I'm not sure but is it possible to drop last xlogs for user and start to use previous version of schema (and vshard version as well) as workaround in case of "fatal error"?

Even if we will try to drop latest xlog files, it is not really
a solution, because

    1) These xlogs can contain not only upgrade changes. An xlog file
       could already exist at the moment when someone did reload with
       a newer version;

    2) During xlog deletion there may appear another fatal error. Any
       unlink() or close() can fail;

    3) Xlog may be already sent by replication;

    4) It is not possible to rollback code of vshard to the previous
       version automatically. Someone needs to do it manually using code
       reload or restart. VShard does not have previous code version, so 
       it can't checkout itself to an older version (except in tests,
       where I now that git is available, installed, and it is safe to
       checkout). In case you upgrade via restart, you should restart again
       with older version. In case you upgrade via reload, you should just
       stop reload, and put the old vshard.storage object back into _G. All
       should be safe in case upgrade is atomic.

You can argue that we can do a snapshot before upgrade to solve the first
problem, but snapshot is a disk killer if it is called automatically
on several very fat instances on the same machine. I am here again
trying to make things easier for the cloud team, who never restart instances
unless it is unavoidable.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call()
  2020-03-23  6:42       ` Oleg Babin
@ 2020-03-23 22:32         ` Vladislav Shpilevoy
  0 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-03-23 22:32 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

On 23/03/2020 07:42, Oleg Babin wrote:
> On 22/03/2020 22:13, Vladislav Shpilevoy wrote:
>>>> +
>>>> +local function service_call(...)
>>>> +    local service_name = select(1, ...)
>>>> +    return service_call_api[service_name](select(2, ...))
>>>> +end
>>>
> 
> What's about a following diff?
> ============
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index 6aa5d6a..3653914 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -2203,9 +2203,8 @@ service_call_api = setmetatable({
>      return res
>  end})
> 
> -local function service_call(...)
> -    local service_name = select(1, ...)
> -    return service_call_api[service_name](select(2, ...))
> +local function service_call(service_name, ...)
> +    return service_call_api[service_name](...)
>  end
> ============

Indeed, that looks better. I missed that I could declare
first argument separately. Applied.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy
  2020-03-23 22:32         ` Vladislav Shpilevoy
@ 2020-03-24  4:32           ` Oleg Babin
  0 siblings, 0 replies; 14+ messages in thread
From: Oleg Babin @ 2020-03-24  4:32 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for answers and explanation. Both patches LGTM.

On 24/03/2020 01:32, Vladislav Shpilevoy wrote:
> Hi! Thanks for the review!
> 
> On 23/03/2020 07:35, Oleg Babin wrote:
>> On 22/03/2020 22:12, Vladislav Shpilevoy wrote:
>>> But for 0.1.16.0 the atomicity is possible, I added
>>> box.begin/commit/rollback.
>>>
>>> -    for _, handler in ipairs(handlers) do
>>> +    for _, handler in ipairs(schema_upgrade_handlers) do
>>>            local next_version = handler.version
>>>            if schema_compare_versions(next_version, version) > 0 then
>>>                local next_version_str = table.concat(next_version, '.')
>>>                log.info("Upgrade vshard to {%s}", next_version_str)
>>> -            handler.func(username, password)
>>> +            local ok, err = pcall(handler.func, username, password)
>>> +            if not ok then
>>> +                log.info("Couldn't upgrade to {%s}: %s", next_version_str, err)
>>
>> Seems that you've missed box.rollback here. And I think that it should be something like "if box.is_in_txn() then box.rollback() end"
> 
> Yes, you are right.
> 
> box.is_in_txn() is not needed here. box.rollback() is no-op in case
> there is no a transaction.
> 
> ====================
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index c0e768a..630d1ff 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -400,6 +400,7 @@ local function schema_upgrade_master(username, password)
>               log.info("Upgrade vshard to {%s}", next_version_str)
>               local ok, err = pcall(handler.func, username, password)
>               if not ok then
> +                box.rollback()
>                   log.info("Couldn't upgrade to {%s}: %s", next_version_str, err)
>                   error(err)
>               end
> ====================
> 
>> I'm not sure but is it possible to drop last xlogs for user and start to use previous version of schema (and vshard version as well) as workaround in case of "fatal error"?
> 
> Even if we will try to drop latest xlog files, it is not really
> a solution, because
> 
>      1) These xlogs can contain not only upgrade changes. An xlog file
>         could already exist at the moment when someone did reload with
>         a newer version;
> 
>      2) During xlog deletion there may appear another fatal error. Any
>         unlink() or close() can fail;
> 
>      3) Xlog may be already sent by replication;
> 
>      4) It is not possible to rollback code of vshard to the previous
>         version automatically. Someone needs to do it manually using code
>         reload or restart. VShard does not have previous code version, so
>         it can't checkout itself to an older version (except in tests,
>         where I now that git is available, installed, and it is safe to
>         checkout). In case you upgrade via restart, you should restart again
>         with older version. In case you upgrade via reload, you should just
>         stop reload, and put the old vshard.storage object back into _G. All
>         should be safe in case upgrade is atomic.
> 
> You can argue that we can do a snapshot before upgrade to solve the first
> problem, but snapshot is a disk killer if it is called automatically
> on several very fat instances on the same machine. I am here again
> trying to make things easier for the cloud team, who never restart instances
> unless it is unavoidable.
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy
  2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy Vladislav Shpilevoy
  2020-03-22  5:05   ` Oleg Babin
@ 2020-03-24 15:21   ` Yaroslav Dynnikov
  2020-03-24 23:44     ` Vladislav Shpilevoy
  1 sibling, 1 reply; 14+ messages in thread
From: Yaroslav Dynnikov @ 2020-03-24 15:21 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: yaroslav.dynnikov, tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 1974 bytes --]

Hi, Vlad.

I've run my cartridge upgrade tests on your branch (6e50e26c), and it's ok
on tarantool 2.2, but fails for 1.10:
```
replica | ApplyConfigError: Space _schema does not support multi-statement
transactions
replica | stack traceback:
replica | ...cartridge/.rocks/share/tarantool/vshard/storage/init.lua:419:
in function 'schema_upgrade'
replica | ...cartridge/.rocks/share/tarantool/vshard/storage/init.lua:2336:
in function 'cfg'
```

Here are results from Gitlab CI:
https://gitlab.com/tarantool/cartridge/pipelines/129256300

And here is one more remark about the patch itself.

On Sat, 21 Mar 2020 at 21:59, Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
wrote:

>
>  local function this_is_master()
> @@ -2169,8 +2276,12 @@ local function storage_cfg(cfg, this_replica_uuid,
> is_reload)
>              error(err)
>          end
>          log.info("Box has been configured")
> -        local uri = luri.parse(this_replica.uri)
> -        box.once("vshard:storage:1", storage_schema_v1, uri.login,
> uri.password)
> +    end
> +
> +    local uri = luri.parse(this_replica.uri)
> +    schema_upgrade(is_master, uri.login, uri.password)
> +
> +    if not is_reload then
>

It seems like this `if/else` statement isn't necessary. The `else` branch
is enough for both cases.
Even it's not hot reload it would result in `local old = nil;
box.space._bucket:on_replace(new_trigger, nil)` which is essentially the
same


>          box.space._bucket:on_replace(bucket_generation_increment)
>      else
>          local old = box.space._bucket:on_replace()[1]
> @@ -2469,6 +2580,8 @@ M.rlist = {
>      add_tail = rlist_add_tail,
>      remove = rlist_remove,
>  }
> +M.schema_latest_version = schema_latest_version
> +M.schema_current_version = schema_current_version
>
>  return {
>      sync = sync,
> --
> 2.21.1 (Apple Git-122.3)
>
>

-- 
С уважением.
Дынников Ярослав.

[-- Attachment #2: Type: text/html, Size: 3976 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy
  2020-03-24 15:21   ` Yaroslav Dynnikov
@ 2020-03-24 23:44     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 14+ messages in thread
From: Vladislav Shpilevoy @ 2020-03-24 23:44 UTC (permalink / raw)
  To: Yaroslav Dynnikov; +Cc: tarantool-patches

Hi!

I reworked the patch. The diff is big, so I will send a second version.
Below I also provide diff.

On 24/03/2020 16:21, Yaroslav Dynnikov wrote:
> Hi, Vlad.
> 
> I've run my cartridge upgrade tests on your branch (6e50e26c), and it's ok on tarantool 2.2, but fails for 1.10:
> ```
> |
> replica | ApplyConfigError: Space _schema does not support multi-statement transactions
> ||
> replica | stack traceback:
> ||
> replica | ...cartridge/.rocks/share/tarantool/vshard/storage/init.lua:419: in function 'schema_upgrade'
> ||replica | ...cartridge/.rocks/share/tarantool/vshard/storage/init.lua:2336: in function 'cfg'|
> |```|
> |
> |
> Here are results from Gitlab CI: https://gitlab.com/tarantool/cartridge/pipelines/129256300
> 
> And here is one more remark about the patch itself.
> 
> On Sat, 21 Mar 2020 at 21:59, Vladislav Shpilevoy <v.shpilevoy@tarantool.org <mailto:v.shpilevoy@tarantool.org>> wrote:
> 
> 
>      local function this_is_master()
>     @@ -2169,8 +2276,12 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload)
>                  error(err)
>              end
>              log.info <http://log.info>("Box has been configured")
>     -        local uri = luri.parse(this_replica.uri)
>     -        box.once("vshard:storage:1", storage_schema_v1, uri.login, uri.password)
>     +    end
>     +
>     +    local uri = luri.parse(this_replica.uri)
>     +    schema_upgrade(is_master, uri.login, uri.password)
>     +
>     +    if not is_reload then
> 
> 
> It seems like this `if/else` statement isn't necessary. The `else` branch is enough for both cases.
> Even it's not hot reload it would result in `local old = nil; box.space._bucket:on_replace(new_trigger, nil)` which is essentially the same

Agree, added to the patch.

====================

diff --git a/test/unit/box2.lua b/test/unit/box2.lua
new file mode 120000
index 0000000..1347390
--- /dev/null
+++ b/test/unit/box2.lua
@@ -0,0 +1 @@
+box.lua
\ No newline at end of file
diff --git a/test/unit/upgrade.result b/test/unit/upgrade.result
new file mode 100644
index 0000000..5530746
--- /dev/null
+++ b/test/unit/upgrade.result
@@ -0,0 +1,218 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+-- There is no way to revert a bootstrapped storage manually. The
+-- test is done in a new instance which is killed with all its
+-- data afterwards.
+
+_ = test_run:cmd("create server newdefault with script='unit/box2.lua'")
+ | ---
+ | ...
+_ = test_run:cmd("start server newdefault")
+ | ---
+ | ...
+_ = test_run:switch("newdefault")
+ | ---
+ | ...
+
+util = require('util')
+ | ---
+ | ...
+
+vshard = require('vshard')
+ | ---
+ | ...
+
+upgrade = vshard.storage.internal.schema_upgrade_master
+ | ---
+ | ...
+handlers = vshard.storage.internal.schema_upgrade_handlers
+ | ---
+ | ...
+version_make = vshard.storage.internal.schema_version_make
+ | ---
+ | ...
+curr_version = vshard.storage.internal.schema_current_version
+ | ---
+ | ...
+schema_bootstrap = vshard.storage.internal.schema_bootstrap
+ | ---
+ | ...
+
+user = 'storage'
+ | ---
+ | ...
+password = 'storage'
+ | ---
+ | ...
+
+schema_bootstrap(user, password)
+ | ---
+ | ...
+
+version_make({0, 1, 16, 0})
+ | ---
+ | - '{0.1.16.0}'
+ | ...
+
+versions = {}
+ | ---
+ | ...
+
+_ = test_run:cmd("setopt delimiter ';'")
+ | ---
+ | ...
+for v1 = 1, 3 do
+    for v2 = 1, 3 do
+        for v3 = 1, 3 do
+            for v4 = 1, 3 do
+                table.insert(versions, version_make({v1, v2, v3, v4}))
+            end
+        end
+    end
+end;
+ | ---
+ | ...
+
+err = nil;
+ | ---
+ | ...
+count = 0;
+ | ---
+ | ...
+for i, v1 in pairs(versions) do
+    for j, v2 in pairs(versions) do
+        local v1n = string.format("%s.%s.%s.%s", v1[1], v1[2], v1[3], v1[4])
+        local v2n = string.format("%s.%s.%s.%s", v2[1], v2[2], v2[3], v2[4])
+        count = count + 1
+        if ((v1 == v2) ~= (v1n == v2n)) or ((v1 ~= v2) ~= (v1n ~= v2n)) or
+            ((v1 <= v2) ~= (v1n <= v2n)) or ((v1 < v2) ~= (v1n < v2n)) or
+            ((v1 >= v2) ~= (v1n >= v2n)) or ((v1 > v2) ~= (v1n > v2n)) then
+            err = {i, j}
+            break
+        end
+    end
+    if err then
+        break
+    end
+end;
+ | ---
+ | ...
+err;
+ | ---
+ | - null
+ | ...
+count;
+ | ---
+ | - 6561
+ | ...
+
+function stat_collect()
+    return {
+        func_count = box.space._func:count(),
+        space_count = box.space._space:count(),
+        user_count = box.space._user:count(),
+        schema_count = box.space._schema:count(),
+        version = curr_version(),
+    }
+end;
+ | ---
+ | ...
+
+function stat_diff(s1, s2)
+    local res = {}
+    if s2.func_count ~= s1.func_count then
+        res.func_count = s2.func_count - s1.func_count
+    end
+    if s2.space_count ~= s1.space_count then
+        res.space_count = s2.space_count - s1.space_count
+    end
+    if s2.user_count ~= s1.user_count then
+        res.user_count = s2.user_count - s1.user_count
+    end
+    if s2.schema_count ~= s1.schema_count then
+        res.schema_count = s2.schema_count - s1.schema_count
+    end
+    if s1.version ~= s2.version then
+        res.old_version = s1.version
+        res.new_version = s2.version
+    end
+    return res
+end;
+ | ---
+ | ...
+
+stat = stat_collect();
+ | ---
+ | ...
+
+function stat_update()
+    local new_stat = stat_collect()
+    local diff = stat_diff(stat, new_stat)
+    stat = new_stat
+    return diff
+end;
+ | ---
+ | ...
+
+upgrade_trace = {};
+ | ---
+ | ...
+errinj = vshard.storage.internal.errinj;
+ | ---
+ | ...
+
+for _, handler in pairs(handlers) do
+    table.insert(upgrade_trace, string.format('Upgrade to %s', handler.version))
+
+    table.insert(upgrade_trace, 'Errinj in begin')
+    errinj.ERRINJ_UPGRADE = 'begin'
+    table.insert(upgrade_trace,
+                 {util.check_error(upgrade, handler.version, user, password)})
+    table.insert(upgrade_trace, {diff = stat_update()})
+
+    table.insert(upgrade_trace, 'Errinj in end')
+    errinj.ERRINJ_UPGRADE = 'end'
+    table.insert(upgrade_trace,
+                 {util.check_error(upgrade, handler.version, user, password)})
+    table.insert(upgrade_trace, {diff = stat_update()})
+
+    table.insert(upgrade_trace, 'No errinj')
+    errinj.ERRINJ_UPGRADE = nil
+    table.insert(upgrade_trace,
+                 {pcall(upgrade, handler.version, user, password)})
+    table.insert(upgrade_trace, {diff = stat_update()})
+end;
+ | ---
+ | ...
+
+_ = test_run:cmd("setopt delimiter ''");
+ | ---
+ | ...
+
+upgrade_trace
+ | ---
+ | - - Upgrade to {0.1.16.0}
+ |   - Errinj in begin
+ |   - - Errinj in begin
+ |   - diff: []
+ |   - Errinj in end
+ |   - - Errinj in end
+ |   - diff: []
+ |   - No errinj
+ |   - - true
+ |   - diff:
+ |       new_version: '{0.1.16.0}'
+ |       old_version: '{0.1.15.0}'
+ | ...
+
+_ = test_run:switch("default")
+ | ---
+ | ...
+_ = test_run:cmd("stop server newdefault")
+ | ---
+ | ...
+_ = test_run:cmd("cleanup server newdefault")
+ | ---
+ | ...
+_ = test_run:cmd("delete server newdefault")
+ | ---
+ | ...
diff --git a/test/unit/upgrade.test.lua b/test/unit/upgrade.test.lua
new file mode 100644
index 0000000..9bbfd84
--- /dev/null
+++ b/test/unit/upgrade.test.lua
@@ -0,0 +1,133 @@
+test_run = require('test_run').new()
+
+-- There is no way to revert a bootstrapped storage manually. The
+-- test is done in a new instance which is killed with all its
+-- data afterwards.
+
+_ = test_run:cmd("create server newdefault with script='unit/box2.lua'")
+_ = test_run:cmd("start server newdefault")
+_ = test_run:switch("newdefault")
+
+util = require('util')
+
+vshard = require('vshard')
+
+upgrade = vshard.storage.internal.schema_upgrade_master
+handlers = vshard.storage.internal.schema_upgrade_handlers
+version_make = vshard.storage.internal.schema_version_make
+curr_version = vshard.storage.internal.schema_current_version
+schema_bootstrap = vshard.storage.internal.schema_bootstrap
+
+user = 'storage'
+password = 'storage'
+
+schema_bootstrap(user, password)
+
+version_make({0, 1, 16, 0})
+
+versions = {}
+
+_ = test_run:cmd("setopt delimiter ';'")
+for v1 = 1, 3 do
+    for v2 = 1, 3 do
+        for v3 = 1, 3 do
+            for v4 = 1, 3 do
+                table.insert(versions, version_make({v1, v2, v3, v4}))
+            end
+        end
+    end
+end;
+
+err = nil;
+count = 0;
+for i, v1 in pairs(versions) do
+    for j, v2 in pairs(versions) do
+        local v1n = string.format("%s.%s.%s.%s", v1[1], v1[2], v1[3], v1[4])
+        local v2n = string.format("%s.%s.%s.%s", v2[1], v2[2], v2[3], v2[4])
+        count = count + 1
+        if ((v1 == v2) ~= (v1n == v2n)) or ((v1 ~= v2) ~= (v1n ~= v2n)) or
+            ((v1 <= v2) ~= (v1n <= v2n)) or ((v1 < v2) ~= (v1n < v2n)) or
+            ((v1 >= v2) ~= (v1n >= v2n)) or ((v1 > v2) ~= (v1n > v2n)) then
+            err = {i, j}
+            break
+        end
+    end
+    if err then
+        break
+    end
+end;
+err;
+count;
+
+function stat_collect()
+    return {
+        func_count = box.space._func:count(),
+        space_count = box.space._space:count(),
+        user_count = box.space._user:count(),
+        schema_count = box.space._schema:count(),
+        version = curr_version(),
+    }
+end;
+
+function stat_diff(s1, s2)
+    local res = {}
+    if s2.func_count ~= s1.func_count then
+        res.func_count = s2.func_count - s1.func_count
+    end
+    if s2.space_count ~= s1.space_count then
+        res.space_count = s2.space_count - s1.space_count
+    end
+    if s2.user_count ~= s1.user_count then
+        res.user_count = s2.user_count - s1.user_count
+    end
+    if s2.schema_count ~= s1.schema_count then
+        res.schema_count = s2.schema_count - s1.schema_count
+    end
+    if s1.version ~= s2.version then
+        res.old_version = s1.version
+        res.new_version = s2.version
+    end
+    return res
+end;
+
+stat = stat_collect();
+
+function stat_update()
+    local new_stat = stat_collect()
+    local diff = stat_diff(stat, new_stat)
+    stat = new_stat
+    return diff
+end;
+
+upgrade_trace = {};
+errinj = vshard.storage.internal.errinj;
+
+for _, handler in pairs(handlers) do
+    table.insert(upgrade_trace, string.format('Upgrade to %s', handler.version))
+
+    table.insert(upgrade_trace, 'Errinj in begin')
+    errinj.ERRINJ_UPGRADE = 'begin'
+    table.insert(upgrade_trace,
+                 {util.check_error(upgrade, handler.version, user, password)})
+    table.insert(upgrade_trace, {diff = stat_update()})
+
+    table.insert(upgrade_trace, 'Errinj in end')
+    errinj.ERRINJ_UPGRADE = 'end'
+    table.insert(upgrade_trace,
+                 {util.check_error(upgrade, handler.version, user, password)})
+    table.insert(upgrade_trace, {diff = stat_update()})
+
+    table.insert(upgrade_trace, 'No errinj')
+    errinj.ERRINJ_UPGRADE = nil
+    table.insert(upgrade_trace,
+                 {pcall(upgrade, handler.version, user, password)})
+    table.insert(upgrade_trace, {diff = stat_update()})
+end;
+
+_ = test_run:cmd("setopt delimiter ''");
+
+upgrade_trace
+
+_ = test_run:switch("default")
+_ = test_run:cmd("stop server newdefault")
+_ = test_run:cmd("cleanup server newdefault")
+_ = test_run:cmd("delete server newdefault")
diff --git a/test/upgrade/upgrade.result b/test/upgrade/upgrade.result
index e21d353..4157b08 100644
--- a/test/upgrade/upgrade.result
+++ b/test/upgrade/upgrade.result
@@ -145,14 +145,11 @@ box.space._schema:get({'vshard_version'})
  | ...
 vshard.storage.internal.schema_current_version()
  | ---
- | - [0, 1, 16, 0]
+ | - '{0.1.16.0}'
  | ...
 vshard.storage.internal.schema_latest_version
  | ---
- | - - 0
- |   - 1
- |   - 16
- |   - 0
+ | - '{0.1.16.0}'
  | ...
 
 test_run:switch('storage_1_b')
@@ -165,14 +162,11 @@ box.space._schema:get({'vshard_version'})
  | ...
 vshard.storage.internal.schema_current_version()
  | ---
- | - [0, 1, 16, 0]
+ | - '{0.1.16.0}'
  | ...
 vshard.storage.internal.schema_latest_version
  | ---
- | - - 0
- |   - 1
- |   - 16
- |   - 0
+ | - '{0.1.16.0}'
  | ...
 
 test_run:switch('default')
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 630d1ff..ebc6af1 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -74,6 +74,7 @@ if not M then
             ERRINJ_LAST_RECEIVE_DELAY = false,
             ERRINJ_RECEIVE_PARTIALLY = false,
             ERRINJ_NO_RECOVERY = false,
+            ERRINJ_UPGRADE = false,
         },
         -- This counter is used to restart background fibers with
         -- new reloaded code.
@@ -258,6 +259,33 @@ end
 -- Schema
 --------------------------------------------------------------------------------
 
+local schema_version_mt = {
+    __tostring = function(self)
+        return string.format('{%s}', table.concat(self, '.'))
+    end,
+    __serialize = function(self)
+        return tostring(self)
+    end,
+    __eq = function(l, r)
+        return l[1] == r[1] and l[2] == r[2] and l[3] == r[3] and l[4] == r[4]
+    end,
+    __lt = function(l, r)
+        for i = 1, 4 do
+            local diff = l[i] - r[i]
+            if diff < 0 then
+                return true
+            elseif diff > 0 then
+                return false
+            end
+        end
+        return false;
+    end,
+}
+
+local function schema_version_make(ver)
+    return setmetatable(ver, schema_version_mt)
+end
+
 -- VShard versioning works in 4 numbers: major, minor, patch, and
 -- a last helper number incremented on every schema change, if
 -- first 3 numbers stay not changed. That happens when users take
@@ -267,7 +295,7 @@ end
 -- The schema first time appeared with 0.1.16. So this function
 -- describes schema before that - 0.1.15.
 local function schema_init_0_1_15_0(username, password)
-    log.info("Initializing schema")
+    log.info("Initializing schema %s", schema_version_make({0, 1, 15, 0}))
     box.schema.user.create(username, {
         password = password,
         if_not_exists = true,
@@ -309,36 +337,31 @@ local function schema_init_0_1_15_0(username, password)
     box.space._schema:replace({'vshard_version', 0, 1, 15, 0})
 end
 
-local function schema_compare_versions(v1, v2)
-    for i = 1, 4 do
-        if v1[i] ~= v2[i] then
-            return v1[i] - v2[i]
-        end
-    end
-    return 0
-end
-
 local function schema_upgrade_to_0_1_16_0(username)
     -- Since 0.1.16.0 the old versioning by
     -- 'oncevshard:storage:<number>' is dropped because it is not
     -- really extendible nor understandable.
-    log.info('Insert vshard_version into _schema')
-    box.begin()
+    log.info("Insert 'vshard_version' into _schema")
     box.space._schema:replace({'vshard_version', 0, 1, 16, 0})
     box.space._schema:delete({'oncevshard:storage:1'})
-    box.commit()
+end
+
+local function schema_downgrade_from_0_1_16_0()
+    log.info("Remove 'vshard_version' from _schema")
+    box.space._schema:replace({'oncevshard:storage:1'})
+    box.space._schema:delete({'vshard_version'})
 end
 
 local function schema_current_version()
     local version = box.space._schema:get({'vshard_version'})
     if version == nil then
-        return {0, 1, 15, 0}
+        return schema_version_make({0, 1, 15, 0})
     else
-        return version:totable(2)
+        return schema_version_make(version:totable(2))
     end
 end
 
-local schema_latest_version = {0, 1, 16, 0}
+local schema_latest_version = schema_version_make({0, 1, 16, 0})
 
 local function schema_upgrade_replica()
     local version = schema_current_version()
@@ -354,11 +377,10 @@ local function schema_upgrade_replica()
     -- For example, from 1.2.3.4 to 1.7.8.9, when it is assumed
     -- that a safe upgrade should go 1.2.3.4 -> 1.2.4.1 ->
     -- 1.3.1.1 and so on step by step.
-    if schema_compare_versions(version, schema_latest_version) ~= 0 then
+    if version ~= schema_latest_version then
         log.info('Replica\' vshard schema version is not latest - current '..
-                 '%s vs latest %s, but the replica still can work',
-                 table.concat(version, '.'),
-                 table.concat(schema_latest_version, '.'))
+                 '%s vs latest %s, but the replica still can work', version,
+                 schema_latest_version)
     end
     -- In future for hard changes the replica may be suspended
     -- until its schema is synced with master. Or it may
@@ -374,10 +396,14 @@ end
 -- prohibit yields, in case the upgrade, for example, affects huge
 -- number of tuples (_bucket records, maybe).
 local schema_upgrade_handlers = {
-    {version = {0, 1, 16, 0}, func = schema_upgrade_to_0_1_16_0},
+    {
+        version = schema_version_make({0, 1, 16, 0}),
+        upgrade = schema_upgrade_to_0_1_16_0,
+        downgrade = schema_downgrade_from_0_1_16_0
+    },
 }
 
-local function schema_upgrade_master(username, password)
+local function schema_upgrade_master(target_version, username, password)
     local _schema = box.space._schema
     local is_old_versioning = _schema:get({'oncevshard:storage:1'}) ~= nil
     local version = schema_current_version()
@@ -386,39 +412,60 @@ local function schema_upgrade_master(username, password)
     if is_bootstrap then
         schema_init_0_1_15_0(username, password)
     elseif is_old_versioning then
-        log.info('The instance does not have vshard_version record. '..
-                 'It is 0.1.15.0.')
-    end
-    local handler_latest_version =
-        schema_upgrade_handlers[#schema_upgrade_handlers].version
-    assert(schema_compare_versions(handler_latest_version,
-                                   schema_latest_version) == 0)
+        log.info("The instance does not have 'vshard_version' record. "..
+                 "It is 0.1.15.0.")
+    end
+    assert(schema_upgrade_handlers[#schema_upgrade_handlers].version ==
+           schema_latest_version)
+    local prev_version = version
+    local ok, err1, err2
+    local errinj = M.errinj.ERRINJ_UPGRADE
     for _, handler in ipairs(schema_upgrade_handlers) do
         local next_version = handler.version
-        if schema_compare_versions(next_version, version) > 0 then
-            local next_version_str = table.concat(next_version, '.')
-            log.info("Upgrade vshard to {%s}", next_version_str)
-            local ok, err = pcall(handler.func, username, password)
+        if next_version > target_version then
+            break
+        end
+        if next_version > version then
+            log.info("Upgrade vshard schema to %s", next_version)
+            if errinj == 'begin' then
+                ok, err1 = false, 'Errinj in begin'
+            else
+                ok, err1 = pcall(handler.upgrade, username)
+                if ok and errinj == 'end' then
+                    ok, err1 = false, 'Errinj in end'
+                end
+            end
             if not ok then
+                -- Rollback in case the handler started a
+                -- transaction before the exception.
                 box.rollback()
-                log.info("Couldn't upgrade to {%s}: %s", next_version_str, err)
-                error(err)
+                log.info("Couldn't upgrade schema to %s: '%s'. Revert to %s",
+                         next_version, err1, prev_version)
+                ok, err2 = pcall(handler.downgrade)
+                if not ok then
+                    log.info("Couldn't downgrade schema to %s - fatal error: "..
+                             "'%s'", prev_version, err2)
+                    os.exit(-1)
+                end
+                error(err1)
             end
-            log.info("Successful vshard upgrade to {%s}", next_version_str)
-            ok, err = pcall(_schema.replace, _schema,
-                            {'vshard_version', unpack(next_version)})
+            ok, err1 = pcall(_schema.replace, _schema,
+                             {'vshard_version', unpack(next_version)})
             if not ok then
-                log.info("Upgraded to {%s} but couldn't update _schema "..
-                         "'vshard_version' - fatal error: %s", err)
+                log.info("Upgraded schema to %s but couldn't update _schema "..
+                         "'vshard_version' - fatal error: '%s'", next_version,
+                         err1)
                 os.exit(-1)
             end
+            log.info("Successful vshard schema upgrade to %s", next_version)
         end
+        prev_version = next_version
     end
 end
 
 local function schema_upgrade(is_master, username, password)
     if is_master then
-        return schema_upgrade_master(username, password)
+        return schema_upgrade_master(schema_latest_version, username, password)
     else
         return schema_upgrade_replica()
     end
@@ -2298,12 +2345,8 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload)
     local uri = luri.parse(this_replica.uri)
     schema_upgrade(is_master, uri.login, uri.password)
 
-    if not is_reload then
-        box.space._bucket:on_replace(bucket_generation_increment)
-    else
-        local old = box.space._bucket:on_replace()[1]
-        box.space._bucket:on_replace(bucket_generation_increment, old)
-    end
+    local old_trigger = box.space._bucket:on_replace()[1]
+    box.space._bucket:on_replace(bucket_generation_increment, old_trigger)
 
     lreplicaset.rebind_replicasets(new_replicasets, M.replicasets)
     lreplicaset.outdate_replicasets(M.replicasets)
@@ -2599,6 +2642,10 @@ M.rlist = {
 }
 M.schema_latest_version = schema_latest_version
 M.schema_current_version = schema_current_version
+M.schema_upgrade_master = schema_upgrade_master
+M.schema_upgrade_handlers = schema_upgrade_handlers
+M.schema_version_make = schema_version_make
+M.schema_bootstrap = schema_init_0_1_15_0
 
 return {
     sync = sync,

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-03-24 23:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-21 18:59 [Tarantool-patches] [PATCH vshard 0/2] vshard upgrade and _call Vladislav Shpilevoy
2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy Vladislav Shpilevoy
2020-03-22  5:05   ` Oleg Babin
2020-03-22 19:12     ` Vladislav Shpilevoy
2020-03-23  6:35       ` Oleg Babin
2020-03-23 22:32         ` Vladislav Shpilevoy
2020-03-24  4:32           ` Oleg Babin
2020-03-24 15:21   ` Yaroslav Dynnikov
2020-03-24 23:44     ` Vladislav Shpilevoy
2020-03-21 18:59 ` [Tarantool-patches] [PATCH vshard 2/2] storage: introduce vshard.storage._call() Vladislav Shpilevoy
2020-03-22  5:08   ` Oleg Babin
2020-03-22 19:13     ` Vladislav Shpilevoy
2020-03-23  6:42       ` Oleg Babin
2020-03-23 22:32         ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox