Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Alex Khatskevich <avkhatskevich@tarantool.org>,
	tarantool-patches@freelists.org
Subject: [tarantool-patches] Re: [PATCH 2/2] Fix discovery/reconfigure race
Date: Tue, 26 Jun 2018 14:11:42 +0300	[thread overview]
Message-ID: <a3040955-c12c-f7de-8bae-e83650940bb9@tarantool.org> (raw)
In-Reply-To: <f6e438ae-64af-7cb5-bf99-938e821f21c8@tarantool.org>

Thanks for the patch! See 5 comments below.

>>
>> 3. This cycle makes no sense. With set LONG_DISCOVERY it is
>> equivalent to calling router.discovery_wakeup() once.
> Disagree.
> I need to stuck on the first routemap element. However I need to wakeup it #replicasets
> times, because the discovery_fiber sleeps after each iteration.

1. I removed this cycle together with any wakeups and the test passed. So you are
wrong. Please, make the test be testing that discovery really works.

This is my diff that does not break the test:

	@@ -419,17 +419,11 @@ end;
	
	 -- Perform #replicasets phases of discovery, to update replicasets
	 -- object in for loop of discovery fiber since previous cfg.
	-for _, __ in pairs(vshard.router.internal.replicasets) do
	-    vshard.router.discovery_wakeup()
	-    fiber.sleep(0.02)
	-end;
	+vshard.router.discovery_wakeup();
	 -- Simulate long `callro`.
	 -- Stuck on first rs in replicasets.
	 vshard.router.internal.errinj.ERRINJ_LONG_DISCOVERY = true;
	-for _, __ in pairs(vshard.router.internal.replicasets) do
	-    vshard.router.discovery_wakeup()
	-    fiber.sleep(0.02)
	-end;
	+fiber.sleep(0.02);
	
	 vshard.router.cfg(cfg);
	 vshard.router.internal.route_map[333] = nil

>>
>>> +---
>>> +...
>>> +vshard.router.cfg(cfg);
>>> +---
>>> +...
>>> +vshard.router.internal.errinj.LONG_DISCOVERY = nil;
>>> +---
>>> +...
>>> +-- Do discovery iteration.
>>> +vshard.router.discovery_wakeup()
>>> +fiber.sleep(0.02)
>>
>> 4. Concrete timeouts are the way to create an unstable test.
>> Please, get rid of them and replace with 'while not cond do wait end'
>> where necessary.
> It is important that discovery fiber did only one iteration of discovery. So, If I want some `while` mechanism I want to introduce several synchronization and block/unblock phases.
> I get you point and suggest compromise here.
> 1. I have places an assert here, which checks that expected discovery has occurred.
> 2. I have launched the test 100 times and it did not fail

2. Please, just remove this single sleeps and do 'while cond do fiber.sleep(0.01) end'.
The fact that the test is passed on your computer means nothing. It can always pass on your
machine, but sometimes fail on mine, easily. Or on the travis, or on a customer machine.

You do not need any 'synchronization and block/unblock phases'. You already know what do
you expect to see after the sleep, so just put this condition in the while.

If you want to test some deadlines like it is done for auto network timeouts for the
router, then just remember the current time before while, afterwards and test that
the duration was < your limit.

I have already debugged several of such '100% passing tests with hard timeouts'
and do not want to do it again.

>>
>>>   - true
>>> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
>>> index 7e765fa..df5b343 100644
>>> --- a/vshard/router/init.lua
>>> +++ b/vshard/router/init.lua
>>> @@ -127,10 +127,23 @@ local function discovery_f(module_version)
>>> +            -- Renew replicasets object in case of reconfigure
>>> +            -- and reload events.
>>
>> 5. You do not renew here anything.
>>
>>> +            if M.replicasets ~= old_replicasets then
>>> +                break
>>> +            end
> I have rewritten the comment. The replicasets variable captured by the
> for loop should be updated by calling `break` and exiting the for loop.
> 
> 

3. Please, put a new patch version here.

> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
> index 7e765fa..5fc0f59 100644
> --- a/vshard/router/init.lua
> +++ b/vshard/router/init.lua
> @@ -127,10 +128,23 @@ local function discovery_f(module_version)
>      local iterations_until_lua_gc =
>          consts.COLLECT_LUA_GARBAGE_INTERVAL / consts.DISCOVERY_INTERVAL
>      while module_version == M.module_version do
> -        for _, replicaset in pairs(M.replicasets) do
> +        local old_replicasets = M.replicasets
> +        for rs_uuid, replicaset in pairs(M.replicasets) do
>              local active_buckets, err =
>                  replicaset:callro('vshard.storage.buckets_discovery', {},
>                                    {timeout = 2})
> +            while M.errinj.ERRINJ_LONG_DISCOVERY do
> +                -- Stuck on the first replicaset.

4. When you set this errinj, it sticks on the first rs with no special
checks and blocks other replicasets too. So remove this check. Just do
sleep until the errinj is set.

To prevent your dissension: I removed the check and the test passed too.
Together with the diff above about wakeups removal.

> +                if rs_uuid ~= select(1, next(M.replicasets)) then

5. select(1... here does nothing. It is equivalent to

     rs_uuid ~= next(M.replicasets)

You can start arguing, but do not. I have already checked it.

> +                    break
> +                end
> +                lfiber.sleep(0.01)
> +            end
> +            -- Renew replicasets object captured by the for loop
> +            -- in case of reconfigure and reload events.
> +            if M.replicasets ~= old_replicasets then
> +                break
> +            end
>              if not active_buckets then
>                  log.error('Error during discovery %s: %s', replicaset, err)
>              else

  reply	other threads:[~2018-06-26 11:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-15 12:47 [tarantool-patches] [PATCH 0/2][vshard] preserve route map AKhatskevich
2018-06-15 12:47 ` [tarantool-patches] [PATCH 1/2] Preserve route_map on router.cfg AKhatskevich
2018-06-21 12:54   ` [tarantool-patches] " Vladislav Shpilevoy
2018-06-25 12:48     ` Alex Khatskevich
2018-06-15 12:47 ` [tarantool-patches] [PATCH 2/2] Fix discovery/reconfigure race AKhatskevich
2018-06-21 12:54   ` [tarantool-patches] " Vladislav Shpilevoy
2018-06-25 12:48     ` Alex Khatskevich
2018-06-26 11:11       ` Vladislav Shpilevoy [this message]
2018-06-26 14:03         ` Alex Khatskevich
2018-06-27 11:45           ` Vladislav Shpilevoy
2018-06-27 19:50             ` Alex Khatskevich
2018-06-28 19:41               ` Vladislav Shpilevoy
2018-06-21 12:54 ` [tarantool-patches] Re: [PATCH 0/2][vshard] preserve route map Vladislav Shpilevoy
2018-06-25 11:52   ` Alex Khatskevich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a3040955-c12c-f7de-8bae-e83650940bb9@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=avkhatskevich@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='[tarantool-patches] Re: [PATCH 2/2] Fix discovery/reconfigure race' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox