[tarantool-patches] Re: [PATCH 1/4] Fix races related to object outdating

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Wed Aug 1 15:36:34 MSK 2018


Hi! Thanks for the patch! See 2 comments below.

> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
> index 142ddb6..1a0ed2f 100644
> --- a/vshard/router/init.lua
> +++ b/vshard/router/init.lua
> @@ -88,15 +94,18 @@ local function bucket_discovery(bucket_id)
>      log.verbose("Discovering bucket %d", bucket_id)
>      local last_err = nil
>      local unreachable_uuid = nil
> -    for uuid, replicaset in pairs(M.replicasets) do
> -        local _, err =
> -            replicaset:callrw('vshard.storage.bucket_stat', {bucket_id})
> -        if err == nil then
> -            bucket_set(bucket_id, replicaset)
> -            return replicaset
> -        elseif err.code ~= lerror.code.WRONG_BUCKET then
> -            last_err = err
> -            unreachable_uuid = uuid
> +    for uuid, _ in pairs(M.replicasets) do
> +        -- Handle reload/reconfigure.
> +        replicaset = M.replicasets[uuid]

1. Please, explain, how is it possible, that before this line
M.replicasets[uuid] can become nil. You iterate here over
M.replicasets and on the previous line in '_' you have
stored replicaset. How can here 'replicaset' differ from '_'?

It has nothing in common with 'reload/reconfigure' case since
you always iterate over M.replicasets - the most actual
list of replicasets. Maybe you thought that pairs() stores
its first argument into a temporary variable but looks like
it is not. I checked it with a simple test:

	a = {}
	a.objs = {1, 2, 3, 4, 5}
	for k, v in pairs(a.objs) do
	    print(a.objs[k])
	    if k == 2 then
	        a.objs = {6, 7, 8, 9, 10}
	    end
	end

	Output:
	1
	2
	8
	9
	10
	---
	...

> +        if replicaset then
> +            local _, err =
> +                replicaset:callrw('vshard.storage.bucket_stat', {bucket_id})
> +            if err == nil then
> +                return bucket_set(bucket_id, replicaset.uuid)
> +            elseif err.code ~= lerror.code.WRONG_BUCKET then
> +                last_err = err
> +                unreachable_uuid = uuid
> +            end
>          end
>      end
>      local err = nil
> @@ -513,27 +522,28 @@ local function router_cfg(cfg)
>      end
>      box.cfg(box_cfg)
>      log.info("Box has been configured")
> -    M.connection_outdate_delay = cfg.connection_outdate_delay
> -    M.total_bucket_count = total_bucket_count
> -    M.collect_lua_garbage = collect_lua_garbage
> -    M.current_cfg = new_cfg
>      -- Move connections from an old configuration to a new one.
>      -- It must be done with no yields to prevent usage both of not
>      -- fully moved old replicasets, and not fully built new ones.
> -    lreplicaset.rebind_replicasets(new_replicasets, M.replicasets,
> -                                   M.connection_outdate_delay)
> -    M.replicasets = new_replicasets
> +    lreplicaset.rebind_replicasets(new_replicasets, M.replicasets)
>      -- Now the new replicasets are fully built. Can establish
>      -- connections and yield.
>      for _, replicaset in pairs(new_replicasets) do
>          replicaset:connect_all()
>      end
> +    lreplicaset.wait_masters_connect(new_replicasets)
> +    lreplicaset.outdate_replicasets(M.replicasets, cfg.connection_outdate_delay)
> +    M.connection_outdate_delay = cfg.connection_outdate_delay
> +    M.total_bucket_count = total_bucket_count
> +    M.collect_lua_garbage = collect_lua_garbage
> +    M.current_cfg = cfg
> +    M.replicasets = new_replicasets
>      -- Update existing route map in-place.
> -    for bucket, rs in pairs(M.route_map) do
> +    local old_route_map = M.route_map
> +    M.route_map = {}
> +    for bucket, rs in pairs(old_route_map) do
>          M.route_map[bucket] = M.replicasets[rs.uuid]

2. Why do you need to save old_route_map into a
separate variable? You can update M.route_map in place
like it was done before. It is not? You fill the new
route_map with exactly the same keys (maybe with different
values, but it does not affect 'for' iteration). Moreover,
when you create a new route_map instead of resetting the
old, you double the memory. For huge bucket count it can
be noticeable.

>      end
> -
> -    lreplicaset.wait_masters_connect(new_replicasets)
>      if M.failover_fiber == nil then
>          lfiber.create(util.reloadable_fiber_f, M, 'failover_f', 'Failover')
>      end




More information about the Tarantool-patches mailing list