From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 72FD222769 for ; Mon, 23 Jul 2018 10:44:31 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p9EnlJyOSThZ for ; Mon, 23 Jul 2018 10:44:31 -0400 (EDT) Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id A5FE0223BD for ; Mon, 23 Jul 2018 10:44:30 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH 4/4] Introduce storage reload evolution References: From: Vladislav Shpilevoy Message-ID: Date: Mon, 23 Jul 2018 17:44:26 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org, AKhatskevich Thanks for the patch! See 4 comments below. On 23/07/2018 14:14, AKhatskevich wrote: > Changes: > 1. Introduce storage reload evolution. > 2. Setup cross-version reload testing. > > 1: > This mechanism updates Lua objects on reload in case they are > changed in a new vshard.storage version. > > Since this commit, any change in vshard.storage.M has to be > reflected in vshard.storage.reload_evolution to guarantee > correct reload. > > 2: > The testing uses git infrastructure and is performed in the following > way: > 1. Copy old version of vshard to a temp folder. > 2. Run vshard on this code. > 3. Checkout the latest version of the vshard sources. > 4. Reload vshard storage. > 5. Make sure it works (Perform simple tests). > > Notes: > * this patch contains some legacy-driven decisions: > 1. SOURCEDIR path retrieved differently in case of > packpack build. > 2. git directory in the `reload_evolution/storage` test > is copied with respect to Centos 7 and `ro` mode of > SOURCEDIR. > > diff --git a/test/reload_evolution/storage.result b/test/reload_evolution/storage.result > new file mode 100644 > index 0000000..54ff6b7 > --- /dev/null > +++ b/test/reload_evolution/storage.result > @@ -0,0 +1,248 @@ > +test_run = require('test_run').new() > +--- > +... > +git_util = require('lua_libs.git_util') > +--- > +... > +util = require('lua_libs.util') > +--- > +... > +vshard_copy_path = util.BUILDDIR .. '/test/var/vshard_git_tree_copy' > +--- > +... > +evolution_log = git_util.log_hashes({args='vshard/storage/reload_evolution.lua', dir=util.SOURCEDIR}) > +--- > +... > +-- Cleanup the directory after a previous build. > +_ = os.execute('rm -rf ' .. vshard_copy_path) > +--- > +... > +-- 1. `git worktree` cannot be used because PACKPACK mounts > +-- `/source/` in `ro` mode. > +-- 2. Just `cp -rf` cannot be used due to a little different > +-- behavior in Centos 7. > +_ = os.execute('mkdir ' .. vshard_copy_path) > +--- > +... > +_ = os.execute("cd " .. util.SOURCEDIR .. ' && cp -rf `ls -A --ignore=build` ' .. vshard_copy_path) > +--- > +... > +-- Checkout the first commit with a reload_evolution mechanism. > +git_util.exec_cmd({cmd='checkout', args='-f', dir=vshard_copy_path}) > +--- > +... > +git_util.exec_cmd({cmd='checkout', args=evolution_log[#evolution_log] .. '~1', dir=vshard_copy_path}) > +--- > +... > +REPLICASET_1 = { 'storage_1_a', 'storage_1_b' } > +--- > +... > +REPLICASET_2 = { 'storage_2_a', 'storage_2_b' } > +--- > +... > +test_run:create_cluster(REPLICASET_1, 'reload_evolution') > +--- > +... > +test_run:create_cluster(REPLICASET_2, 'reload_evolution') > +--- > +... > +util = require('lua_libs.util') > +--- > +... > +util.wait_master(test_run, REPLICASET_1, 'storage_1_a') > +--- > +... > +util.wait_master(test_run, REPLICASET_2, 'storage_2_a') > +--- > +... > +test_run:switch('storage_1_a') > +--- > +- true > +... > +vshard.storage.bucket_force_create(1, vshard.consts.DEFAULT_BUCKET_COUNT / 2) > +--- > +- true > +... > +bucket_id_to_move = vshard.consts.DEFAULT_BUCKET_COUNT > +--- > +... > +test_run:switch('storage_2_a') > +--- > +- true > +... > +fiber = require('fiber') > +--- > +... > +vshard.storage.bucket_force_create(vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1, vshard.consts.DEFAULT_BUCKET_COUNT / 2) > +--- > +- true > +... > +bucket_id_to_move = vshard.consts.DEFAULT_BUCKET_COUNT > +--- > +... > +vshard.storage.internal.reload_evolution_version > +--- > +- null > +... > +box.space.test:insert({42, bucket_id_to_move}) > +--- > +- [42, 3000] > +... > +while test_run:grep_log('storage_2_a', 'The cluster is balanced ok') == nil do vshard.storage.rebalancer_wakeup() fiber.sleep(0.1) end 1. Now you have wait_rebalancer_state util from the previous commit. > +--- > +... > +test_run:switch('default') > +--- > +- true > +... > +git_util.exec_cmd({cmd='checkout', args=evolution_log[1], dir=vshard_copy_path}) > +--- > +... > +test_run:switch('storage_2_a') > +--- > +- true > +... > +package.loaded["vshard.storage"] = nil > +--- > +... > +vshard.storage = require("vshard.storage") > +--- > +... > +test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution: upgraded to') ~= nil > +--- > +- true > +... > +vshard.storage.internal.reload_evolution_version > +--- > +- 1 > +... > +-- Make sure storage operates well. > +vshard.storage.bucket_force_drop(2000) > +--- > +- true > +... > +vshard.storage.bucket_force_create(2000) > +--- > +- true > +... > +vshard.storage.buckets_info()[2000] > +--- > +- status: active > + id: 2000 > +... > +vshard.storage.call(bucket_id_to_move, 'read', 'do_select', {42}) > +--- > +- true > +- - [42, 3000] > +... > +vshard.storage.bucket_send(bucket_id_to_move, replicaset1_uuid) > +--- > +- true > +... > +vshard.storage.garbage_collector_wakeup() > +--- > +... > +fiber = require('fiber') > +--- > +... > +while box.space._bucket:get({bucket_id_to_move}) do fiber.sleep(0.01) end > +--- > +... > +test_run:switch('storage_1_a') > +--- > +- true > +... > +vshard.storage.bucket_send(bucket_id_to_move, replicaset2_uuid) > +--- > +- true > +... > +test_run:switch('storage_2_a') > +--- > +- true > +... > +vshard.storage.call(bucket_id_to_move, 'read', 'do_select', {42}) > +--- > +- true > +- - [42, 3000] > +... > +-- Check info() does not fail. > +vshard.storage.info() ~= nil > +--- > +- true > +... > +-- > +-- Send buckets to create a disbalance. Wait until the rebalancer > +-- repairs it. Similar to `tests/rebalancer/rebalancer.test.lua`. > +-- > +vshard.storage.rebalancer_disable() > +--- > +... > +move_start = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1 > +--- > +... > +move_cnt = 100 > +--- > +... > +assert(move_start + move_cnt < vshard.consts.DEFAULT_BUCKET_COUNT) > +--- > +- true > +... > +for i = move_start, move_start + move_cnt - 1 do box.space._bucket:delete{i} end > +--- > +... > +box.space._bucket.index.status:count({vshard.consts.BUCKET.ACTIVE}) > +--- > +- 1400 > +... > +test_run:switch('storage_1_a') > +--- > +- true > +... > +move_start = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1 > +--- > +... > +move_cnt = 100 > +--- > +... > +vshard.storage.bucket_force_create(move_start, move_cnt) > +--- > +- true > +... > +box.space._bucket.index.status:count({vshard.consts.BUCKET.ACTIVE}) > +--- > +- 1600 > +... > +test_run:switch('storage_2_a') > +--- > +- true > +... > +vshard.storage.rebalancer_enable() > +--- > +... > +vshard.storage.rebalancer_wakeup() 2. You do not need explicit rebalancer_wakeup. wait_rebalancer_state calls it. > +--- > +... > +wait_rebalancer_state("Rebalance routes are sent", test_run) > +--- > +... > +wait_rebalancer_state('The cluster is balanced ok', test_run) > +--- > +... > +box.space._bucket.index.status:count({vshard.consts.BUCKET.ACTIVE}) > +--- > +- 1500 > +... > +test_run:switch('default') > +--- > +- true > +... > +test_run:drop_cluster(REPLICASET_2) > +--- > +... > +test_run:drop_cluster(REPLICASET_1) > +--- > +... > +test_run:cmd('clear filter') > +--- > +- true > +... > diff --git a/test/unit/reload_evolution.result b/test/unit/reload_evolution.result > new file mode 100644 > index 0000000..342ac24 > --- /dev/null > +++ b/test/unit/reload_evolution.result > @@ -0,0 +1,45 @@ > +test_run = require('test_run').new() > +--- > +... > +fiber = require('fiber') > +--- > +... > +log = require('log') > +--- > +... > +util = require('util') > +--- > +... > +reload_evolution = require('vshard.storage.reload_evolution') > +--- > +... > +-- Init with the latest version. > +fake_M = { reload_evolution_version = reload_evolution.version } > +--- > +... > +-- Test reload to the same version. > +reload_evolution.upgrade(fake_M) > +--- > +... > +test_run:grep_log('default', 'vshard.storage.evolution') == nil > +--- > +- true > +... > +-- Test downgrage version. > +log.info(string.rep('a', 1000)) > +--- > +... > +fake_M.reload_evolution_version = fake_M.reload_evolution_version + 1 > +--- > +... > +err = util.check_error(reload_evolution.upgrade, fake_M) > +--- > +... > +err:match('auto%-downgrade is not implemented') > +--- > +- auto-downgrade is not implemented 3. Why do you need match? check_error output is ok already. And what is 'auto%'? I see that you always print exactly "auto-downgrade" in reload_evolution.lua. > +... > +test_run:grep_log('default', 'vshard.storage.evolution', 1000) ~= nil > +--- > +- false > +... > @@ -105,6 +110,11 @@ if not M then > -- a destination replicaset must drop already received > -- data. > rebalancer_sending_bucket = 0, > + > + ------------------------- Reload ------------------------- > + -- Version of the loaded module. This number is used on > + -- reload to determine which upgrade scripts to run. > + reload_evolution_version = reload_evolution.version, 4. Please, rename it to just 'version' or 'reload_version' or 'module_version'. 'Reload_evolution_version' is too long and complex. > } > end >