[Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Tue Mar 24 01:32:08 MSK 2020


Hi! Thanks for the review!

On 23/03/2020 07:35, Oleg Babin wrote:
> On 22/03/2020 22:12, Vladislav Shpilevoy wrote:
>> But for 0.1.16.0 the atomicity is possible, I added
>> box.begin/commit/rollback.
>>
>> -    for _, handler in ipairs(handlers) do
>> +    for _, handler in ipairs(schema_upgrade_handlers) do
>>           local next_version = handler.version
>>           if schema_compare_versions(next_version, version) > 0 then
>>               local next_version_str = table.concat(next_version, '.')
>>               log.info("Upgrade vshard to {%s}", next_version_str)
>> -            handler.func(username, password)
>> +            local ok, err = pcall(handler.func, username, password)
>> +            if not ok then
>> +                log.info("Couldn't upgrade to {%s}: %s", next_version_str, err)
> 
> Seems that you've missed box.rollback here. And I think that it should be something like "if box.is_in_txn() then box.rollback() end"

Yes, you are right.

box.is_in_txn() is not needed here. box.rollback() is no-op in case
there is no a transaction.

====================
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index c0e768a..630d1ff 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -400,6 +400,7 @@ local function schema_upgrade_master(username, password)
             log.info("Upgrade vshard to {%s}", next_version_str)
             local ok, err = pcall(handler.func, username, password)
             if not ok then
+                box.rollback()
                 log.info("Couldn't upgrade to {%s}: %s", next_version_str, err)
                 error(err)
             end
====================

> I'm not sure but is it possible to drop last xlogs for user and start to use previous version of schema (and vshard version as well) as workaround in case of "fatal error"?

Even if we will try to drop latest xlog files, it is not really
a solution, because

    1) These xlogs can contain not only upgrade changes. An xlog file
       could already exist at the moment when someone did reload with
       a newer version;

    2) During xlog deletion there may appear another fatal error. Any
       unlink() or close() can fail;

    3) Xlog may be already sent by replication;

    4) It is not possible to rollback code of vshard to the previous
       version automatically. Someone needs to do it manually using code
       reload or restart. VShard does not have previous code version, so 
       it can't checkout itself to an older version (except in tests,
       where I now that git is available, installed, and it is safe to
       checkout). In case you upgrade via restart, you should restart again
       with older version. In case you upgrade via reload, you should just
       stop reload, and put the old vshard.storage object back into _G. All
       should be safe in case upgrade is atomic.

You can argue that we can do a snapshot before upgrade to solve the first
problem, but snapshot is a disk killer if it is called automatically
on several very fat instances on the same machine. I am here again
trying to make things easier for the cloud team, who never restart instances
unless it is unavoidable.


More information about the Tarantool-patches mailing list