From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp20.mail.ru (smtp20.mail.ru [94.100.179.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 2CCDF4696C3 for ; Tue, 24 Mar 2020 01:32:10 +0300 (MSK) References: <364ea64d7bf6beef8cea83ff2fbb432ba967cd80.1584817081.git.v.shpilevoy@tarantool.org> <41dcbae6-b009-5f2e-ca28-a71ea4541488@tarantool.org> From: Vladislav Shpilevoy Message-ID: <955a5af8-bcfa-8b29-a808-e240e363033c@tarantool.org> Date: Mon, 23 Mar 2020 23:32:08 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [Tarantool-patches] [PATCH vshard 1/2] storage: introduce upgrade strategy List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Oleg Babin , tarantool-patches@dev.tarantool.org, yaroslav.dynnikov@tarantool.org Hi! Thanks for the review! On 23/03/2020 07:35, Oleg Babin wrote: > On 22/03/2020 22:12, Vladislav Shpilevoy wrote: >> But for 0.1.16.0 the atomicity is possible, I added >> box.begin/commit/rollback. >> >> -    for _, handler in ipairs(handlers) do >> +    for _, handler in ipairs(schema_upgrade_handlers) do >>           local next_version = handler.version >>           if schema_compare_versions(next_version, version) > 0 then >>               local next_version_str = table.concat(next_version, '.') >>               log.info("Upgrade vshard to {%s}", next_version_str) >> -            handler.func(username, password) >> +            local ok, err = pcall(handler.func, username, password) >> +            if not ok then >> +                log.info("Couldn't upgrade to {%s}: %s", next_version_str, err) > > Seems that you've missed box.rollback here. And I think that it should be something like "if box.is_in_txn() then box.rollback() end" Yes, you are right. box.is_in_txn() is not needed here. box.rollback() is no-op in case there is no a transaction. ==================== diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua index c0e768a..630d1ff 100644 --- a/vshard/storage/init.lua +++ b/vshard/storage/init.lua @@ -400,6 +400,7 @@ local function schema_upgrade_master(username, password) log.info("Upgrade vshard to {%s}", next_version_str) local ok, err = pcall(handler.func, username, password) if not ok then + box.rollback() log.info("Couldn't upgrade to {%s}: %s", next_version_str, err) error(err) end ==================== > I'm not sure but is it possible to drop last xlogs for user and start to use previous version of schema (and vshard version as well) as workaround in case of "fatal error"? Even if we will try to drop latest xlog files, it is not really a solution, because 1) These xlogs can contain not only upgrade changes. An xlog file could already exist at the moment when someone did reload with a newer version; 2) During xlog deletion there may appear another fatal error. Any unlink() or close() can fail; 3) Xlog may be already sent by replication; 4) It is not possible to rollback code of vshard to the previous version automatically. Someone needs to do it manually using code reload or restart. VShard does not have previous code version, so it can't checkout itself to an older version (except in tests, where I now that git is available, installed, and it is safe to checkout). In case you upgrade via restart, you should restart again with older version. In case you upgrade via reload, you should just stop reload, and put the old vshard.storage object back into _G. All should be safe in case upgrade is atomic. You can argue that we can do a snapshot before upgrade to solve the first problem, but snapshot is a disk killer if it is called automatically on several very fat instances on the same machine. I am here again trying to make things easier for the cloud team, who never restart instances unless it is unavoidable.