From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tarantool-patches-bounce@freelists.org>
Received: from localhost (localhost [127.0.0.1])
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 72FD222769
	for <tarantool-patches@freelists.org>; Mon, 23 Jul 2018 10:44:31 -0400 (EDT)
Received: from turing.freelists.org ([127.0.0.1])
	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id p9EnlJyOSThZ for <tarantool-patches@freelists.org>;
	Mon, 23 Jul 2018 10:44:31 -0400 (EDT)
Received: from smtp42.i.mail.ru (smtp42.i.mail.ru [94.100.177.102])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id A5FE0223BD
	for <tarantool-patches@freelists.org>; Mon, 23 Jul 2018 10:44:30 -0400 (EDT)
Subject: [tarantool-patches] Re: [PATCH 4/4] Introduce storage reload
 evolution
References: <cover.1532344375.git.avkhatskevich@tarantool.org>
 <ea3624ae45ea652ce72be2694632221a4ab52a87.1532344376.git.avkhatskevich@tarantool.org>
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Message-ID: <f3094131-92c5-8877-5462-2aa11a1b1f5b@tarantool.org>
Date: Mon, 23 Jul 2018 17:44:26 +0300
MIME-Version: 1.0
In-Reply-To: <ea3624ae45ea652ce72be2694632221a4ab52a87.1532344376.git.avkhatskevich@tarantool.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: tarantool-patches-bounce@freelists.org
Errors-to: tarantool-patches-bounce@freelists.org
Reply-To: tarantool-patches@freelists.org
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <tarantool-patches-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: tarantool-patches <tarantool-patches.freelists.org>
List-subscribe: <tarantool-patches-request@freelists.org?Subject=subscribe>
List-owner: <mailto:>
List-post: <mailto:tarantool-patches@freelists.org>
List-archive: <http://www.freelists.org/archives/tarantool-patches>
To: tarantool-patches@freelists.org, AKhatskevich <avkhatskevich@tarantool.org>

Thanks for the patch! See 4 comments below.

On 23/07/2018 14:14, AKhatskevich wrote:
> Changes:
> 1. Introduce storage reload evolution.
> 2. Setup cross-version reload testing.
> 
> 1:
> This mechanism updates Lua objects on reload in case they are
> changed in a new vshard.storage version.
> 
> Since this commit, any change in vshard.storage.M has to be
> reflected in vshard.storage.reload_evolution to guarantee
> correct reload.
> 
> 2:
> The testing uses git infrastructure and is performed in the following
> way:
> 1. Copy old version of vshard to a temp folder.
> 2. Run vshard on this code.
> 3. Checkout the latest version of the vshard sources.
> 4. Reload vshard storage.
> 5. Make sure it works (Perform simple tests).
> 
> Notes:
> * this patch contains some legacy-driven decisions:
>    1. SOURCEDIR path retrieved differently in case of
>       packpack build.
>    2. git directory in the `reload_evolution/storage` test
>       is copied with respect to Centos 7 and `ro` mode of
>       SOURCEDIR.
> 
> diff --git a/test/reload_evolution/storage.result b/test/reload_evolution/storage.result
> new file mode 100644
> index 0000000..54ff6b7
> --- /dev/null
> +++ b/test/reload_evolution/storage.result
> @@ -0,0 +1,248 @@
> +test_run = require('test_run').new()
> +---
> +...
> +git_util = require('lua_libs.git_util')
> +---
> +...
> +util = require('lua_libs.util')
> +---
> +...
> +vshard_copy_path = util.BUILDDIR .. '/test/var/vshard_git_tree_copy'
> +---
> +...
> +evolution_log = git_util.log_hashes({args='vshard/storage/reload_evolution.lua', dir=util.SOURCEDIR})
> +---
> +...
> +-- Cleanup the directory after a previous build.
> +_ = os.execute('rm -rf ' .. vshard_copy_path)
> +---
> +...
> +-- 1. `git worktree` cannot be used because PACKPACK mounts
> +-- `/source/` in `ro` mode.
> +-- 2. Just `cp -rf` cannot be used due to a little different
> +-- behavior in Centos 7.
> +_ = os.execute('mkdir ' .. vshard_copy_path)
> +---
> +...
> +_ = os.execute("cd " .. util.SOURCEDIR .. ' && cp -rf `ls -A --ignore=build` ' .. vshard_copy_path)
> +---
> +...
> +-- Checkout the first commit with a reload_evolution mechanism.
> +git_util.exec_cmd({cmd='checkout', args='-f', dir=vshard_copy_path})
> +---
> +...
> +git_util.exec_cmd({cmd='checkout', args=evolution_log[#evolution_log] .. '~1', dir=vshard_copy_path})
> +---
> +...
> +REPLICASET_1 = { 'storage_1_a', 'storage_1_b' }
> +---
> +...
> +REPLICASET_2 = { 'storage_2_a', 'storage_2_b' }
> +---
> +...
> +test_run:create_cluster(REPLICASET_1, 'reload_evolution')
> +---
> +...
> +test_run:create_cluster(REPLICASET_2, 'reload_evolution')
> +---
> +...
> +util = require('lua_libs.util')
> +---
> +...
> +util.wait_master(test_run, REPLICASET_1, 'storage_1_a')
> +---
> +...
> +util.wait_master(test_run, REPLICASET_2, 'storage_2_a')
> +---
> +...
> +test_run:switch('storage_1_a')
> +---
> +- true
> +...
> +vshard.storage.bucket_force_create(1, vshard.consts.DEFAULT_BUCKET_COUNT / 2)
> +---
> +- true
> +...
> +bucket_id_to_move = vshard.consts.DEFAULT_BUCKET_COUNT
> +---
> +...
> +test_run:switch('storage_2_a')
> +---
> +- true
> +...
> +fiber = require('fiber')
> +---
> +...
> +vshard.storage.bucket_force_create(vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1, vshard.consts.DEFAULT_BUCKET_COUNT / 2)
> +---
> +- true
> +...
> +bucket_id_to_move = vshard.consts.DEFAULT_BUCKET_COUNT
> +---
> +...
> +vshard.storage.internal.reload_evolution_version
> +---
> +- null
> +...
> +box.space.test:insert({42, bucket_id_to_move})
> +---
> +- [42, 3000]
> +...
> +while test_run:grep_log('storage_2_a', 'The cluster is balanced ok') == nil do vshard.storage.rebalancer_wakeup() fiber.sleep(0.1) end

1. Now you have wait_rebalancer_state util from the previous commit.

> +---
> +...
> +test_run:switch('default')
> +---
> +- true
> +...
> +git_util.exec_cmd({cmd='checkout', args=evolution_log[1], dir=vshard_copy_path})
> +---
> +...
> +test_run:switch('storage_2_a')
> +---
> +- true
> +...
> +package.loaded["vshard.storage"] = nil
> +---
> +...
> +vshard.storage = require("vshard.storage")
> +---
> +...
> +test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution: upgraded to') ~= nil
> +---
> +- true
> +...
> +vshard.storage.internal.reload_evolution_version
> +---
> +- 1
> +...
> +-- Make sure storage operates well.
> +vshard.storage.bucket_force_drop(2000)
> +---
> +- true
> +...
> +vshard.storage.bucket_force_create(2000)
> +---
> +- true
> +...
> +vshard.storage.buckets_info()[2000]
> +---
> +- status: active
> +  id: 2000
> +...
> +vshard.storage.call(bucket_id_to_move, 'read', 'do_select', {42})
> +---
> +- true
> +- - [42, 3000]
> +...
> +vshard.storage.bucket_send(bucket_id_to_move, replicaset1_uuid)
> +---
> +- true
> +...
> +vshard.storage.garbage_collector_wakeup()
> +---
> +...
> +fiber = require('fiber')
> +---
> +...
> +while box.space._bucket:get({bucket_id_to_move}) do fiber.sleep(0.01) end
> +---
> +...
> +test_run:switch('storage_1_a')
> +---
> +- true
> +...
> +vshard.storage.bucket_send(bucket_id_to_move, replicaset2_uuid)
> +---
> +- true
> +...
> +test_run:switch('storage_2_a')
> +---
> +- true
> +...
> +vshard.storage.call(bucket_id_to_move, 'read', 'do_select', {42})
> +---
> +- true
> +- - [42, 3000]
> +...
> +-- Check info() does not fail.
> +vshard.storage.info() ~= nil
> +---
> +- true
> +...
> +--
> +-- Send buckets to create a disbalance. Wait until the rebalancer
> +-- repairs it. Similar to `tests/rebalancer/rebalancer.test.lua`.
> +--
> +vshard.storage.rebalancer_disable()
> +---
> +...
> +move_start = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1
> +---
> +...
> +move_cnt = 100
> +---
> +...
> +assert(move_start + move_cnt < vshard.consts.DEFAULT_BUCKET_COUNT)
> +---
> +- true
> +...
> +for i = move_start, move_start + move_cnt - 1 do box.space._bucket:delete{i} end
> +---
> +...
> +box.space._bucket.index.status:count({vshard.consts.BUCKET.ACTIVE})
> +---
> +- 1400
> +...
> +test_run:switch('storage_1_a')
> +---
> +- true
> +...
> +move_start = vshard.consts.DEFAULT_BUCKET_COUNT / 2 + 1
> +---
> +...
> +move_cnt = 100
> +---
> +...
> +vshard.storage.bucket_force_create(move_start, move_cnt)
> +---
> +- true
> +...
> +box.space._bucket.index.status:count({vshard.consts.BUCKET.ACTIVE})
> +---
> +- 1600
> +...
> +test_run:switch('storage_2_a')
> +---
> +- true
> +...
> +vshard.storage.rebalancer_enable()
> +---
> +...
> +vshard.storage.rebalancer_wakeup()

2. You do not need explicit rebalancer_wakeup. wait_rebalancer_state
calls it.

> +---
> +...
> +wait_rebalancer_state("Rebalance routes are sent", test_run)
> +---
> +...
> +wait_rebalancer_state('The cluster is balanced ok', test_run)
> +---
> +...
> +box.space._bucket.index.status:count({vshard.consts.BUCKET.ACTIVE})
> +---
> +- 1500
> +...
> +test_run:switch('default')
> +---
> +- true
> +...
> +test_run:drop_cluster(REPLICASET_2)
> +---
> +...
> +test_run:drop_cluster(REPLICASET_1)
> +---
> +...
> +test_run:cmd('clear filter')
> +---
> +- true
> +...
> diff --git a/test/unit/reload_evolution.result b/test/unit/reload_evolution.result
> new file mode 100644
> index 0000000..342ac24
> --- /dev/null
> +++ b/test/unit/reload_evolution.result
> @@ -0,0 +1,45 @@
> +test_run = require('test_run').new()
> +---
> +...
> +fiber = require('fiber')
> +---
> +...
> +log = require('log')
> +---
> +...
> +util = require('util')
> +---
> +...
> +reload_evolution = require('vshard.storage.reload_evolution')
> +---
> +...
> +-- Init with the latest version.
> +fake_M = { reload_evolution_version = reload_evolution.version }
> +---
> +...
> +-- Test reload to the same version.
> +reload_evolution.upgrade(fake_M)
> +---
> +...
> +test_run:grep_log('default', 'vshard.storage.evolution') == nil
> +---
> +- true
> +...
> +-- Test downgrage version.
> +log.info(string.rep('a', 1000))
> +---
> +...
> +fake_M.reload_evolution_version = fake_M.reload_evolution_version + 1
> +---
> +...
> +err = util.check_error(reload_evolution.upgrade, fake_M)
> +---
> +...
> +err:match('auto%-downgrade is not implemented')
> +---
> +- auto-downgrade is not implemented

3. Why do you need match? check_error output is ok already. And
what is 'auto%'? I see that you always print exactly "auto-downgrade"
in reload_evolution.lua.

> +...
> +test_run:grep_log('default', 'vshard.storage.evolution', 1000) ~= nil
> +---
> +- false
> +...
> @@ -105,6 +110,11 @@ if not M then
>           -- a destination replicaset must drop already received
>           -- data.
>           rebalancer_sending_bucket = 0,
> +
> +        ------------------------- Reload -------------------------
> +        -- Version of the loaded module. This number is used on
> +        -- reload to determine which upgrade scripts to run.
> +        reload_evolution_version = reload_evolution.version,

4. Please, rename it to just 'version' or 'reload_version' or
'module_version'. 'Reload_evolution_version' is too long and
complex.

>       }
>   end
>