[Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy

Oleg Babin olegrok at tarantool.org
Tue Mar 24 07:32:17 MSK 2020


Thanks for answers and explanation. Both patches LGTM.

On 24/03/2020 01:32, Vladislav Shpilevoy wrote:
> Hi! Thanks for the review!
> 
> On 23/03/2020 07:35, Oleg Babin wrote:
>> On 22/03/2020 22:12, Vladislav Shpilevoy wrote:
>>> But for 0.1.16.0 the atomicity is possible, I added
>>> box.begin/commit/rollback.
>>>
>>> -    for _, handler in ipairs(handlers) do
>>> +    for _, handler in ipairs(schema_upgrade_handlers) do
>>>            local next_version = handler.version
>>>            if schema_compare_versions(next_version, version) > 0 then
>>>                local next_version_str = table.concat(next_version, '.')
>>>                log.info("Upgrade vshard to {%s}", next_version_str)
>>> -            handler.func(username, password)
>>> +            local ok, err = pcall(handler.func, username, password)
>>> +            if not ok then
>>> +                log.info("Couldn't upgrade to {%s}: %s", next_version_str, err)
>>
>> Seems that you've missed box.rollback here. And I think that it should be something like "if box.is_in_txn() then box.rollback() end"
> 
> Yes, you are right.
> 
> box.is_in_txn() is not needed here. box.rollback() is no-op in case
> there is no a transaction.
> 
> ====================
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index c0e768a..630d1ff 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -400,6 +400,7 @@ local function schema_upgrade_master(username, password)
>               log.info("Upgrade vshard to {%s}", next_version_str)
>               local ok, err = pcall(handler.func, username, password)
>               if not ok then
> +                box.rollback()
>                   log.info("Couldn't upgrade to {%s}: %s", next_version_str, err)
>                   error(err)
>               end
> ====================
> 
>> I'm not sure but is it possible to drop last xlogs for user and start to use previous version of schema (and vshard version as well) as workaround in case of "fatal error"?
> 
> Even if we will try to drop latest xlog files, it is not really
> a solution, because
> 
>      1) These xlogs can contain not only upgrade changes. An xlog file
>         could already exist at the moment when someone did reload with
>         a newer version;
> 
>      2) During xlog deletion there may appear another fatal error. Any
>         unlink() or close() can fail;
> 
>      3) Xlog may be already sent by replication;
> 
>      4) It is not possible to rollback code of vshard to the previous
>         version automatically. Someone needs to do it manually using code
>         reload or restart. VShard does not have previous code version, so
>         it can't checkout itself to an older version (except in tests,
>         where I now that git is available, installed, and it is safe to
>         checkout). In case you upgrade via restart, you should restart again
>         with older version. In case you upgrade via reload, you should just
>         stop reload, and put the old vshard.storage object back into _G. All
>         should be safe in case upgrade is atomic.
> 
> You can argue that we can do a snapshot before upgrade to solve the first
> problem, but snapshot is a disk killer if it is called automatically
> on several very fat instances on the same machine. I am here again
> trying to make things easier for the cloud team, who never restart instances
> unless it is unavoidable.
> 


More information about the Tarantool-patches mailing list