[tarantool-patches] Re: [PATCH 2/2] Fix discovery/reconfigure race

Alex Khatskevich avkhatskevich at tarantool.org
Mon Jun 25 15:48:24 MSK 2018


>> +--
>> +-- Check route_map is not filled with old replica objects after
>> +-- recpnfigure.
>
> 1. Typo.
fixed
>
>>
>> +-- Simulate long `callro`.
>> +-- Stuck on first rs in replicasets.
>> +vshard.router.internal.errinj.LONG_DISCOVERY = true;
>
> 2. I do not see this error injection in the M.errinj
> declaration.
Fixed
+Renamed to ERRINJ_LONG_DISCOVERY
>
>> +---
>> +...
>> +for _, __ in pairs(vshard.router.internal.replicasets) do
>> +    vshard.router.discovery_wakeup()
>> +    fiber.sleep(0.02)
>> +end;
>
> 3. This cycle makes no sense. With set LONG_DISCOVERY it is
> equivalent to calling router.discovery_wakeup() once.
Disagree.
I need to stuck on the first routemap element. However I need to wakeup 
it #replicasets
times, because the discovery_fiber sleeps after each iteration.
>
>> +---
>> +...
>> +vshard.router.cfg(cfg);
>> +---
>> +...
>> +vshard.router.internal.errinj.LONG_DISCOVERY = nil;
>> +---
>> +...
>> +-- Do discovery iteration.
>> +vshard.router.discovery_wakeup()
>> +fiber.sleep(0.02)
>
> 4. Concrete timeouts are the way to create an unstable test.
> Please, get rid of them and replace with 'while not cond do wait end'
> where necessary.
It is important that discovery fiber did only one iteration of 
discovery. So, If I want some `while` mechanism I want to introduce 
several synchronization and block/unblock phases.
I get you point and suggest compromise here.
1. I have places an assert here, which checks that expected discovery 
has occurred.
2. I have launched the test 100 times and it did not fail
>
>>   - true
>> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
>> index 7e765fa..df5b343 100644
>> --- a/vshard/router/init.lua
>> +++ b/vshard/router/init.lua
>> @@ -127,10 +127,23 @@ local function discovery_f(module_version)
>> +            -- Renew replicasets object in case of reconfigure
>> +            -- and reload events.
>
> 5. You do not renew here anything.
>
>> +            if M.replicasets ~= old_replicasets then
>> +                break
>> +            end
I have rewritten the comment. The replicasets variable captured by the
for loop should be updated by calling `break` and exiting the for loop.






More information about the Tarantool-patches mailing list