Воскресенье, 21 октября 2018, 23:41 +03:00 от Alexander Turenko <alexander.turenko@tarantool.org>:

Hi!

I don't have objections in general. Some minor comments are below.

Please, answer with fixes, don't just send the whole patch.

WBR, Alexander Turenko.

> Instead of using timeout we need just pause `relay_send`. Can't rely
> on timeout because of various system load in parallel mode. Add new
> errinj which checks boolean in loop and until it is not `True` do not
> pass the method `relay_send` to the next statement.
>
> To check the read-only mode, need to make a modification of tuple. It
> is enough to call `replace` method. Instead of `delete` and then
> useless verification that we have not delete space by using `get`
> method.
>

delete space -> delete tuple?
fixed


> +-- In the next two cases we try to replace a tuple while replica
> +-- is catching up with the master (local delete, remote delete)

delete -> replace
fixed

> +-- case

Nit: period at the end.
fixed


> --- check sync
> -errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
> +-- Resume replicaton.

replicaton -> replication
fixed

> diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
> index 5100378b3..22921289d 100644
> --- a/test/replication/gc.test.lua
> +++ b/test/replication/gc.test.lua
> @@ -12,6 +12,7 @@ default_checkpoint_count = box.cfg.checkpoint_count
> box.cfg{checkpoint_count = 1}
>
> function wait_gc(n) while #box.info.gc().checkpoints > n do fiber.sleep(0.01) end end
> +function wait_xlog(n, timeout) timeout = timeout or 1.0 return test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == n end, timeout) end
>

Use 'set delimiter' and write it in several lines. Also, below I
proposed to support 'n' being a table to allow count of files being one
of several values. You can use auxiliary function like the following and
type(n) == 'table' check.

function value_in(val, arr)
    for _, elem in ipairs(arr) do
        if val == elem then
            return true
        end
    end
    return false
end
fixed

> @@ -31,7 +32,7 @@ for i = 1, 100 do s:auto_increment{} end
>
> -- Make sure replica join will take long enough for us to
> -- invoke garbage collection.
> -box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
> +box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
>
> -- While the replica is receiving the initial data set,
> -- make a snapshot and invoke garbage collection, then
> @@ -41,7 +42,7 @@ test_run:cmd("setopt delimiter ';'")
> fiber.create(function()
> fiber.sleep(0.1)
> box.snapshot()
> - box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
> + box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
> end)
> test_run:cmd("setopt delimiter ''");
>

The entire comment:

> -- While the replica is receiving the initial data set,
> -- make a snapshot and invoke garbage collection, then
> -- remove the timeout injection so that we don't have to
> -- wait too long for the replica to start.

Proposed: then remove delay to allow replica to start.
fixed

> --- Remove the timeout injection so that the replica catches
> +-- Resume replicaton so that the replica catches

replicaton -> replication
fixed

> @@ -146,17 +147,16 @@ box.snapshot()
> _ = s:auto_increment{}
> box.snapshot()
> #box.info.gc().checkpoints == 1 or box.info.gc()
> -xlog_count = #fio.glob('./master/*.xlog')
> -- the replica may have managed to download all data
> -- from xlog #1 before it was stopped, in which case
> -- it's OK to collect xlog #1
> -xlog_count == 3 or xlog_count == 2 or fio.listdir('./master')
> +wait_xlog(3, 0.1) or wait_xlog(2, 0.1) or fio.listdir('./master')

You are set timeout to 1.0 for other cases, but 0.2 here. So, 0.2 is
enough? It is better to allow the function to accept a table like {2, 3}
as files count. Use 'set delimiter' and update the function.
fixed

--
Sergei Voronezhskii