Tarantool development patches archive
 help / color / mirror / Atom feed
From: Alex Khatskevich <avkhatskevich@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>,
	tarantool-patches@freelists.org
Subject: [tarantool-patches] Re: [PATCH 2/2] Fix discovery/reconfigure race
Date: Tue, 26 Jun 2018 17:03:26 +0300	[thread overview]
Message-ID: <7790cb4a-ff5e-99bf-03bc-996fea379185@tarantool.org> (raw)
In-Reply-To: <a3040955-c12c-f7de-8bae-e83650940bb9@tarantool.org>


> 1. I removed this cycle together with any wakeups and the test passed. 
> So you are
> wrong. Please, make the test be testing that discovery really works.
>
> 2. Please, just remove this single sleeps and do 'while cond do 
> fiber.sleep(0.01) end'.
> The fact that the test is passed on your computer means nothing. It 
> can always pass on your
> machine, but sometimes fail on mine, easily. Or on the travis, or on a 
> customer machine.
>
> You do not need any 'synchronization and block/unblock phases'. You 
> already know what do
> you expect to see after the sleep, so just put this condition in the 
> while.
>
> If you want to test some deadlines like it is done for auto network 
> timeouts for the
> router, then just remember the current time before while, afterwards 
> and test that
> the duration was < your limit.
>
> I have already debugged several of such '100% passing tests with hard 
> timeouts'
> and do not want to do it again.
>
Fixed as discussed verbally
> 3. Please, put a new patch version here.
below
>
>> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
>> index 7e765fa..5fc0f59 100644
>> --- a/vshard/router/init.lua
>> +++ b/vshard/router/init.lua
>> @@ -127,10 +128,23 @@ local function discovery_f(module_version)
>>      local iterations_until_lua_gc =
>>          consts.COLLECT_LUA_GARBAGE_INTERVAL / consts.DISCOVERY_INTERVAL
>>      while module_version == M.module_version do
>> -        for _, replicaset in pairs(M.replicasets) do
>> +        local old_replicasets = M.replicasets
>> +        for rs_uuid, replicaset in pairs(M.replicasets) do
>>              local active_buckets, err =
>> replicaset:callro('vshard.storage.buckets_discovery', {},
>>                                    {timeout = 2})
>> +            while M.errinj.ERRINJ_LONG_DISCOVERY do
>> +                -- Stuck on the first replicaset.
>
> 4. When you set this errinj, it sticks on the first rs with no special
> checks and blocks other replicasets too. So remove this check. Just do
> sleep until the errinj is set.
>
> To prevent your dissension: I removed the check and the test passed too.
> Together with the diff above about wakeups removal.

Fixed as discussed verbally
>
> 5. select(1... here does nothing. It is equivalent to
>
>     rs_uuid ~= next(M.replicasets)
deleted



Full diff:

commit bc90c3db5f5b1b663c747b8c2829fc8528af70cf
Author: AKhatskevich <avkhatskevich@tarantool.org>
Date:   Thu Jun 14 16:03:09 2018 +0300

     Fix discovery/reconfigure race

     This commit prevents discovery fiber from discovering old replicasets
     and spoiling `route_map`.

diff --git a/test/router/router.result b/test/router/router.result
index 5643f3e..36d54bf 100644
--- a/test/router/router.result
+++ b/test/router/router.result
@@ -1095,6 +1095,55 @@ for bucket, old_rs in pairs(bucket_to_old_rs) do
  end;
  ---
  ...
+--
+-- Check route_map is not filled with old replica objects after
+-- reconfigure.
+--
+-- Simulate long `callro`.
+vshard.router.internal.errinj.ERRINJ_LONG_DISCOVERY = true;
+---
+...
+while not vshard.router.internal.errinj.ERRINJ_LONG_DISCOVERY == 
'waiting' do
+    vshard.router.discovery_wakeup()
+    fiber.sleep(0.02)
+end;
+---
+...
+vshard.router.cfg(cfg);
+---
+...
+route_map = vshard.router.internal.route_map
+for bucket_id, _ in pairs(route_map) do
+    route_map[bucket_id] = nil
+end;
+---
+...
+vshard.router.internal.errinj.ERRINJ_LONG_DISCOVERY = false;
+---
+...
+-- Do discovery iteration. Upload buckets from the
+-- first replicaset.
+while not next(vshard.router.internal.route_map) do
+    vshard.router.discovery_wakeup()
+    fiber.sleep(0.01)
+end;
+---
+...
+new_replicasets = {};
+---
+...
+for _, rs in pairs(vshard.router.internal.replicasets) do
+    new_replicasets[rs] = true
+end;
+---
+...
+_, rs = next(vshard.router.internal.route_map);
+---
+...
+new_replicasets[rs] == true;
+---
+- true
+...
  test_run:cmd("setopt delimiter ''");
  ---
  - true
diff --git a/test/router/router.test.lua b/test/router/router.test.lua
index 106f3d8..843db46 100644
--- a/test/router/router.test.lua
+++ b/test/router/router.test.lua
@@ -411,6 +411,35 @@ for bucket, old_rs in pairs(bucket_to_old_rs) do
          error("route_map was not updataed.")
      end
  end;
+
+--
+-- Check route_map is not filled with old replica objects after
+-- reconfigure.
+--
+-- Simulate long `callro`.
+vshard.router.internal.errinj.ERRINJ_LONG_DISCOVERY = true;
+while not vshard.router.internal.errinj.ERRINJ_LONG_DISCOVERY == 
'waiting' do
+    vshard.router.discovery_wakeup()
+    fiber.sleep(0.02)
+end;
+vshard.router.cfg(cfg);
+route_map = vshard.router.internal.route_map
+for bucket_id, _ in pairs(route_map) do
+    route_map[bucket_id] = nil
+end;
+vshard.router.internal.errinj.ERRINJ_LONG_DISCOVERY = false;
+-- Do discovery iteration. Upload buckets from the
+-- first replicaset.
+while not next(vshard.router.internal.route_map) do
+    vshard.router.discovery_wakeup()
+    fiber.sleep(0.01)
+end;
+new_replicasets = {};
+for _, rs in pairs(vshard.router.internal.replicasets) do
+    new_replicasets[rs] = true
+end;
+_, rs = next(vshard.router.internal.route_map);
+new_replicasets[rs] == true;
  test_run:cmd("setopt delimiter ''");

  _ = test_run:cmd("switch default")
diff --git a/vshard/router/init.lua b/vshard/router/init.lua
index 7e765fa..c125125 100644
--- a/vshard/router/init.lua
+++ b/vshard/router/init.lua
@@ -13,6 +13,7 @@ if not M then
          errinj = {
              ERRINJ_FAILOVER_CHANGE_CFG = false,
              ERRINJ_RELOAD = false,
+            ERRINJ_LONG_DISCOVERY = false,
          },
          -- Bucket map cache.
          route_map = {},
@@ -127,10 +128,19 @@ local function discovery_f(module_version)
      local iterations_until_lua_gc =
          consts.COLLECT_LUA_GARBAGE_INTERVAL / consts.DISCOVERY_INTERVAL
      while module_version == M.module_version do
-        for _, replicaset in pairs(M.replicasets) do
+        local old_replicasets = M.replicasets
+        for rs_uuid, replicaset in pairs(M.replicasets) do
              local active_buckets, err =
replicaset:callro('vshard.storage.buckets_discovery', {},
                                    {timeout = 2})
+            while M.errinj.ERRINJ_LONG_DISCOVERY do
+                lfiber.sleep(0.01)
+            end
+            -- Renew replicasets object captured by the for loop
+            -- in case of reconfigure and reload events.
+            if M.replicasets ~= old_replicasets then
+                break
+            end
              if not active_buckets then
                  log.error('Error during discovery %s: %s', replicaset, 
err)
              else

  reply	other threads:[~2018-06-26 14:03 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-15 12:47 [tarantool-patches] [PATCH 0/2][vshard] preserve route map AKhatskevich
2018-06-15 12:47 ` [tarantool-patches] [PATCH 1/2] Preserve route_map on router.cfg AKhatskevich
2018-06-21 12:54   ` [tarantool-patches] " Vladislav Shpilevoy
2018-06-25 12:48     ` Alex Khatskevich
2018-06-15 12:47 ` [tarantool-patches] [PATCH 2/2] Fix discovery/reconfigure race AKhatskevich
2018-06-21 12:54   ` [tarantool-patches] " Vladislav Shpilevoy
2018-06-25 12:48     ` Alex Khatskevich
2018-06-26 11:11       ` Vladislav Shpilevoy
2018-06-26 14:03         ` Alex Khatskevich [this message]
2018-06-27 11:45           ` Vladislav Shpilevoy
2018-06-27 19:50             ` Alex Khatskevich
2018-06-28 19:41               ` Vladislav Shpilevoy
2018-06-21 12:54 ` [tarantool-patches] Re: [PATCH 0/2][vshard] preserve route map Vladislav Shpilevoy
2018-06-25 11:52   ` Alex Khatskevich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7790cb4a-ff5e-99bf-03bc-996fea379185@tarantool.org \
    --to=avkhatskevich@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='[tarantool-patches] Re: [PATCH 2/2] Fix discovery/reconfigure race' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox