[Tarantool-patches] [PATCH v1] test: flaky hang vinyl/ddl.test.lua test

Aleksandr Lyapunov alyapunov at tarantool.org
Tue Dec 1 23:42:44 MSK 2020


Hi! thanks for the patch!

As I can see the patch does 2 things:

1)reverts 5f96ee5907672bd32dd7fe8dad144ac3d328187f
2)don't do some actions (box.snapshot()) if the space has wrong count in 
an index.

As for me (1) should be done in separate commit. But it's no a big deal.
I think that (2) can be achieved with much more shorter patch:

-box.snapshot()
+if box.space.test.index.pk:count() == box.space.test.index.tk:count() then box.snapshot() end


That would be much more understandable. But it still would be suspicious.

The main problem is: why the test hangs if tarantool makes snapshot?

We must not exit a test end even skip some lines if something went wrong.
If tarantool hangs in some conditions we have to know it.

I think hiding `box.snapshot()` is a wrong way.
The test subsystem must handle with any extreme situation.
And if tarantool hangs - the subsystem must record it's stacktrace,
report and kill tarantool.
As I understood didn't manage to do it. It must be fixed.
It must be one fix of test subsystem instead of dozen of fixes of 
particular tests.


On 11/17/20 6:34 PM, Alexander V. Tikhonov wrote:
> Found hanging test vinyl/ddl.test.lua on:
>
>    [159]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
>    [159]  ---
>    [159]  - true
>    [159]  ...
>    [159] -box.snapshot()
>    [159] ----
>    [159] -- ok
>    [159] -...
>
> The real issue happend before it when test failed on:
>
>    [091] --- engine/ddl.result	Thu May 14 16:12:09 2020
>    [091] +++ engine/ddl.reject	Fri May 15 04:15:07 2020
>    [091] @@ -2558,7 +2558,7 @@
>    [091]  ...
>    [091]  inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
>    [091]  ---
>    [091] -- true
>    [091] +- false
>    [091]  ...
>
> To avoid of hangs decided to rewrite the test to be able to leave the
> hanging subtest on the very start of the issue. Also found that the
> previous fix committed with:
>
>    5f96ee5907672bd32dd7fe8dad144ac3d328187f ('Fix flaky test engine/ddl')
>
> did not fix the issue and its changes were removed with the current
> commit.
>
> Needed for #4353
> ---
>
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4353-hang-ddl
> Issue: https://github.com/tarantool/tarantool/issues/4353
>
>   test/engine/ddl.result   | 91 ++++++++++++++++++++++++++--------------
>   test/engine/ddl.test.lua | 76 ++++++++++++++++++++++++++-------
>   test/engine/suite.ini    |  2 +-
>   3 files changed, 122 insertions(+), 47 deletions(-)
>
> diff --git a/test/engine/ddl.result b/test/engine/ddl.result
> index 8b44d1ee1..3dda5494f 100644
> --- a/test/engine/ddl.result
> +++ b/test/engine/ddl.result
> @@ -2526,41 +2526,52 @@ function gen_load()
>   end;
>   ---
>   ...
> -inspector:cmd("setopt delimiter ''");
> ----
> -- true
> -...
> -fiber = require('fiber')
> ----
> -...
> -ch = fiber.channel(1)
> ----
> -...
> -_ = fiber.create(function() gen_load() ch:put(true) end)
> +function check_equal(check, pk, k)
> +    if pk ~= k then
> +        require('log').error("Error on fiber check: failed '" .. check ..
> +	                     "' check on equal pk " .. pk .. " and k = " .. k)
> +        return false
> +    end
> +    return true
> +end;
>   ---
>   ...
> -_ = box.space.test:create_index('sk', {unique = false, parts = {2, 'unsigned'}})
> +function check_fiber()
> +    _ = fiber.create(function() gen_load() ch:put(true) end)
> +    _ = box.space.test:create_index('sk', {unique = false, parts = {2, 'unsigned'}})
> +
> +    ch:get()
> +
> +    local index = box.space.test.index
> +    if not check_equal("1st step secondary keys", index.pk:count(), index.sk:count()) then
> +        return false
> +    end
> +
> +    _ = fiber.create(function() gen_load() ch:put(true) end)
> +    _ = box.space.test:create_index('tk', {unique = true, parts = {3, 'unsigned'}})
> +
> +    ch:get()
> +
> +    index = box.space.test.index
> +    if not check_equal("2nd step secondary keys", index.pk:count(), index.sk:count()) or
> +            not check_equal("2nd step third keys", index.pk:count(), index.tk:count()) then
> +        return false
> +    end
> +    return true
> +end;
>   ---
>   ...
> -ch:get()
> +inspector:cmd("setopt delimiter ''");
>   ---
>   - true
>   ...
> -_ = fiber.create(function() gen_load() ch:put(true) end)
> ----
> -...
> -_ = box.space.test:create_index('tk', {unique = true, parts = {3, 'unsigned'}})
> ----
> -...
> -ch:get()
> +fiber = require('fiber')
>   ---
> -- true
>   ...
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
> +ch = fiber.channel(1)
>   ---
> -- true
>   ...
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
> +check_fiber()
>   ---
>   - true
>   ...
> @@ -2568,23 +2579,41 @@ inspector:cmd("restart server default")
>   inspector = require('test_run').new()
>   ---
>   ...
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
> +inspector:cmd("setopt delimiter ';'")
>   ---
>   - true
>   ...
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
> +function check_equal(check, pk, k)
> +    if pk ~= k then
> +        require('log').error("Error on server restart check: failed '" .. check ..
> +                             "' check on equal pk " .. pk .. " and k = " .. k)
> +        return false
> +    end
> +    return true
> +end;
>   ---
> -- true
>   ...
> -box.snapshot()
> +function check_server_restart()
> +    local index = box.space.test.index
> +    if not check_equal("1rd step secondary keys", index.pk:count(), index.sk:count()) or
> +            not check_equal("1rd step third keys", index.pk:count(), index.tk:count()) then
> +        return false
> +    end
> +    box.snapshot()
> +    index = box.space.test.index
> +    if not check_equal("2th step secondary keys", index.pk:count(), index.sk:count()) or
> +            not check_equal("2th step third keys", index.pk:count(), index.tk:count()) then
> +        return false
> +    end
> +    return true
> +end;
>   ---
> -- ok
>   ...
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
> +inspector:cmd("setopt delimiter ''");
>   ---
>   - true
>   ...
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
> +check_server_restart()
>   ---
>   - true
>   ...
> diff --git a/test/engine/ddl.test.lua b/test/engine/ddl.test.lua
> index 1d77705dd..c2eda7c06 100644
> --- a/test/engine/ddl.test.lua
> +++ b/test/engine/ddl.test.lua
> @@ -1003,30 +1003,76 @@ function gen_load()
>       end
>   end;
>   
> +function check_equal(check, pk, k)
> +    if pk ~= k then
> +        require('log').error("Error on fiber check: failed '" .. check ..
> +	                     "' check on equal pk " .. pk .. " and k = " .. k)
> +        return false
> +    end
> +    return true
> +end;
> +
> +function check_fiber()
> +    _ = fiber.create(function() gen_load() ch:put(true) end)
> +    _ = box.space.test:create_index('sk', {unique = false, parts = {2, 'unsigned'}})
> +
> +    ch:get()
> +
> +    local index = box.space.test.index
> +    if not check_equal("1st step secondary keys", index.pk:count(), index.sk:count()) then
> +        return false
> +    end
> +
> +    _ = fiber.create(function() gen_load() ch:put(true) end)
> +    _ = box.space.test:create_index('tk', {unique = true, parts = {3, 'unsigned'}})
> +
> +    ch:get()
> +
> +    index = box.space.test.index
> +    if not check_equal("2nd step secondary keys", index.pk:count(), index.sk:count()) or
> +            not check_equal("2nd step third keys", index.pk:count(), index.tk:count()) then
> +        return false
> +    end
> +    return true
> +end;
> +
>   inspector:cmd("setopt delimiter ''");
>   
>   fiber = require('fiber')
>   ch = fiber.channel(1)
> +check_fiber()
>   
> -_ = fiber.create(function() gen_load() ch:put(true) end)
> -_ = box.space.test:create_index('sk', {unique = false, parts = {2, 'unsigned'}})
> -ch:get()
> +inspector:cmd("restart server default")
> +inspector = require('test_run').new()
>   
> -_ = fiber.create(function() gen_load() ch:put(true) end)
> -_ = box.space.test:create_index('tk', {unique = true, parts = {3, 'unsigned'}})
> -ch:get()
> +inspector:cmd("setopt delimiter ';'")
>   
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
> +function check_equal(check, pk, k)
> +    if pk ~= k then
> +        require('log').error("Error on server restart check: failed '" .. check ..
> +                             "' check on equal pk " .. pk .. " and k = " .. k)
> +        return false
> +    end
> +    return true
> +end;
>   
> -inspector:cmd("restart server default")
> +function check_server_restart()
> +    local index = box.space.test.index
> +    if not check_equal("1rd step secondary keys", index.pk:count(), index.sk:count()) or
> +            not check_equal("1rd step third keys", index.pk:count(), index.tk:count()) then
> +        return false
> +    end
> +    box.snapshot()
> +    index = box.space.test.index
> +    if not check_equal("2th step secondary keys", index.pk:count(), index.sk:count()) or
> +            not check_equal("2th step third keys", index.pk:count(), index.tk:count()) then
> +        return false
> +    end
> +    return true
> +end;
>   
> -inspector = require('test_run').new()
> +inspector:cmd("setopt delimiter ''");
>   
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
> -box.snapshot()
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.sk:count() end)
> -inspector:wait_cond(function() return box.space.test.index.pk:count() == box.space.test.index.tk:count() end)
> +check_server_restart()
>   
>   box.space.test:drop()
> diff --git a/test/engine/suite.ini b/test/engine/suite.ini
> index 01899f088..5354699b3 100644
> --- a/test/engine/suite.ini
> +++ b/test/engine/suite.ini
> @@ -15,7 +15,7 @@ fragile = {
>       "tests": {
>           "ddl.test.lua": {
>               "issues": [ "gh-4353" ],
> -            "checksums": [ "dd8851d80183cc75052119ba646e295d" ]
> +            "checksums": [ "928a43a39a843edbf26ec324d73c210f" ]
>           },
>           "gh-4973-concurrent-alter-fails.test.lua": {
>               "issues": [ "gh-5157" ],


More information about the Tarantool-patches mailing list