[tarantool-patches] Re: [PATCH V3] Prohibit router/storage configuration from different fibers

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Thu Sep 20 22:59:21 MSK 2018


Hi! Sorry for the last response - I was on vacation last
2 weeks. See 8 comments below.

On 11/09/2018 12:32, Evgeniy Zaikin wrote:
>  From 5ce83d5db595455a14a58aedf9e6411f5adf1d7c Mon Sep 17 00:00:00 2001
> From: rave4life <evgen4rave at gmail.com>
> Date: Fri, 31 Aug 2018 16:41:11 +0300
> Subject: [PATCH V3] Prohibit router/storage configuration from different fibers
> 
> Fixes: #140
> ---
> Ticket:https://github.com/tarantool/vshard/issues/140
> Branch:https://github.com/tarantool/vshard/tree/rave4life/gh-140-prohibit-multiple-cfg
> 

> diff --git a/test/router/router.result b/test/router/router.result
> index 000307d..ada8309 100644
> --- a/test/router/router.result
> +++ b/test/router/router.result
> @@ -1140,6 +1140,45 @@ test_run:drop_cluster(REPLICASET_2)
>   while test_run:grep_log('router_1', 'disconnected from ') == nil do
> fiber.sleep(0.1) end
>   ---
>   ...
> +-- gh-140: prohibit vshard.router/storage.cfg from multiple fibers.

1. Sorry, still out of 80 symbols. It is a strict rule. Please,
inspect your patch carefully for such code style things before
sending to a review.

> +cfg = {sharding = require('localcfg').sharding}
> +---
> +...
> +vshard = require('vshard')
> +---
> +...
> +fiber = require('fiber')

2. Please, do not use 'default' server for testing
storage or router. Router/router.test.lua starts multiple
Tarantool instances: storage_1_a - storage_2_b and router_1
(see the very beginning of the test file) and switches
between them to test router or storage things. 'Default'
server shall not be used for this. Please, use test_run:switch
and router_1 server for a test. In such a case you will not
have require('vshard') and require('fiber') since they are
already loaded on router_1 instance.

There is a reason why 'default' can not be used - once a
router is started on an instance, it can not be stopped
without restart (its internal fibers, I mean). For the storage
things are worse - it creates new spaces, users, functions
and we have no way to cleanup those things. So once 'default'
is used for testing, it will affect next tests.

For this problem we have an issue: https://github.com/tarantool/vshard/issues/121
You can try it next after this patch is finished if you want.

> +---
> +...
> +vshard.router.cfg(cfg)
> +---
> +...
> +vshard.router.internal.errinj.ERRINJ_MULTIPLE_CFG = false

3. It is false by default already (as I see in router/init.lua),
so what is a point of this line? I removed it and the test
passed ok. I guess it is a typo. Please, fix.

> +---
> +...
> +c = fiber.channel(2)
> +---
> +...
> diff --git a/test/storage/storage.result b/test/storage/storage.result
> index e3e9e7f..0228445 100644
> --- a/test/storage/storage.result
> +++ b/test/storage/storage.result
> @@ -688,6 +688,48 @@ rs:callro('echo', {'some_data'})
>   - null
>   - null
>   ...
> +-- gh-140: prohibit vshard.router/storage.cfg from multiple fibers.
> +cfg = {sharding = {}}
> +---
> +...
> +cfg.sharding[box.info.cluster.uuid] = {replicas = {}}
> +---
> +...
> +cfg.sharding[box.info.cluster.uuid].replicas[box.info.uuid] = {uri =
> 'storage:storage at 127.0.0.1:3309', name = 'test_storage', master =
> true}
> +---
> +...
> +vshard = require('vshard')
> +---
> +...
> +fiber = require('fiber')

4. For the same reason I explained above, you do not
need 'require' here. This code is executed on 'storage_1_a' where
both fiber and vshard are loaded already
(see line storage.test.lua:137 - this is the latest switch before
your test).

> +---
> +...
> +vshard.storage.internal.errinj.ERRINJ_MULTIPLE_CFG = false

5. Same as 3.

> +---
> +...
> +c = fiber.channel(2)
> +---
> +...
> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
> index 7573801..446b1a0 100644
> --- a/vshard/router/init.lua
> +++ b/vshard/router/init.lua
> @@ -28,6 +28,7 @@ if not M then
>           ---------------- Common module attributes ----------------
>           errinj = {
>               ERRINJ_CFG = false,
> +            ERRINJ_MULTIPLE_CFG = false,
>               ERRINJ_FAILOVER_CHANGE_CFG = false,
>               ERRINJ_RELOAD = false,
>               ERRINJ_LONG_DISCOVERY = false,
> @@ -53,6 +54,10 @@ local ROUTER_TEMPLATE = {
>           name = nil,
>           -- The last passed configuration.
>           current_cfg = nil,
> +        -- Fiber channel as the locker to prevent multiple config.
> +        cfg_lock = nil,
> +        -- Configuration counter to control and verify locker.
> +        cfg_counter = 0,

6. Please, remove cfg_counter. Its value > 0 is unreachable
during normal work of router/storage. If you need a counter for
testing lock, you can use erring.ERRINJ_MULTIPLE_CFG as a counter.
It is not necessary to make each errinj as a strict boolean. See
my proposal below in the comment 8.

>           -- Time to outdate old objects on reload.
>           connection_outdate_delay = nil,
>           -- Bucket map cache.
> @@ -594,6 +599,29 @@ local function router_cfg(router, cfg, is_reload)
>       end
>   end
> 
> +--------------------------------------------------------------------------------
> +-- Configuration wrapper
> +--------------------------------------------------------------------------------

7. Function comment should look like this:

     --
     -- My function description wrapped by 66
     -- symbols.
     -- @param name1 Parameter 1 description.
     -- @param name2 Parameter 2 description.
     --
     -- [if returns anything]
     -- @retval Returned value description.
     -- @retval Next possible return value description.
     --

> +
> +local function cfg_wrapper(router, ...)
> +    local enable_lock = not M.errinj.ERRINJ_MULTIPLE_CFG
> +    if enable_lock then
> +        router.cfg_lock:put(true)
> +    end
> +    if router.cfg_counter > 0 then
> +        error('Error: cfg_counter > 0')
> +    end
> +    router.cfg_counter = router.cfg_counter + 1
> +    local status, err = pcall(router_cfg, router, ...)
> +    router.cfg_counter = router.cfg_counter - 1
> +    if enable_lock then
> +        router.cfg_lock:get()
> +    end
> +    if not status then
> +        error(err)
> +    end
> +end

8. Too complex. Lets use errinj.ERRINJ_MULTIPLE_CFG as
an integer and set it to nil by default

M = {
  ...
  errinj = {
     ...
     ERRINJ_MULTIPLE_CFG = nil
  }
  ...
}

...

function cfg_wrapper()
     if ERRINJ_MULTIPLE_CFG then
          ERRINJ_MULTIPLE_CFG = ERRINJ_MULTIPLE_CFG + 1
          if ERRINJ_MULTIPLE_CFG > 1 then
              error('Multiple cfgs at the same time')
          end
          status, err = pcall(router_cfg, router, ...)
          ERRINJ_MULTIPLE_CFG = ERRINJ_MULTIPLE_CFG - 1
     else
         lock:put(true)
         status, err = pcall(router_cfg, router, ...)
         lock:get()
     end
     if not status then
         error(err)
     end
     return
end


ERRINJ_MULTIPLE_CFG is nil by default and to active it
you can do this in a test:

     ERRINJ_MULTIPLE_CFG = 0

This will trigger counter in my pseudo code above since '0'
in Lua is true.





More information about the Tarantool-patches mailing list