From: "Alexander V. Tikhonov" <avtikhon@tarantool.org> To: Kirill Yukhin <kyukhin@tarantool.org>, Nikita Pettik <korablev@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: [Tarantool-patches] [PATCH v1] test: fix hang of vinyl/select_consistency.* test Date: Sun, 15 Nov 2020 21:43:01 +0300 [thread overview] Message-ID: <5b204cb5b32bb96c6a2cec2b06f399b360137250.1605465603.git.avtikhon@tarantool.org> (raw) During testing process: https://gitlab.com/tarantool/tarantool/-/jobs/800863457#L5303 found that test vinyl/select_consistency.test.lua hanged: --- vinyl/select_consistency.result Thu Jun 11 11:59:48 2020 +++ var/152_vinyl/select_consistency.result Tue Oct 20 09:13:02 2020 @@ -137,16 +137,3 @@ for i = 1, ch:size() do ch:get() end; ---- -... -test_run:cmd("setopt delimiter ''"); ---- -- true -... This issue stops the testing till the complete timeout of the test-run will stop all the testing process. Also found that real hang occured at box.snapshot command in snap_loop loop, like: [001] Last 15 lines of Tarantool Log file [Instance "vinyl"][/source/test/var/001_vinyl/vinyl.log]: [001] 2020-11-15 18:13:49.928 [78887] main/105/gc I> removed /source/test/var/001_vinyl/vinyl/519/2/00000000000000011145.run [001] 2020-11-15 18:13:49.928 [78887] vinyl.dump.0/640/task I> writing `/source/test/var/001_vinyl/vinyl/519/0/00000000000000011538.index' [001] 2020-11-15 18:13:49.928 [78887] snapshot/101/main I> done [001] 2020-11-15 18:13:49.929 [78887] main/105/gc I> removed /source/test/var/001_vinyl/vinyl/519/2/00000000000000011157.index [001] 2020-11-15 18:13:49.929 [78887] main/108/vinyl.scheduler I> 519/0: dump completed [001] 2020-11-15 18:13:49.929 [78887] main/108/vinyl.scheduler I> dumped 0 bytes in 0.0 s, rate 0.0 MB/s [001] 2020-11-15 18:13:49.929 [78887] main/481/lua I> vinyl checkpoint completed [001] 2020-11-15 18:13:49.929 [78887] main/105/gc I> removed /source/test/var/001_vinyl/vinyl/519/2/00000000000000011157.run [001] 2020-11-15 18:13:49.929 [78887] main/108/vinyl.scheduler I> 519/0: started compacting range ([94]..inf), runs 2/2 [001] 2020-11-15 18:13:49.930 [78887] wal I> removed /source/test/var/001_vinyl/vinyl/00000000000000093070.xlog [001] 2020-11-15 18:13:49.930 [78887] main/105/gc I> removed /source/test/var/001_vinyl/vinyl/00000000000000094030.snap.inprogress [001] 2020-11-15 18:13:49.930 [78887] main I> removed /source/test/var/001_vinyl/vinyl/00000000000000093306.snap [001] 2020-11-15 18:13:49.930 [78887] main I> removed /source/test/var/001_vinyl/vinyl/00000000000000093070.vylog [001] 2020-11-15 18:13:49.931 [78887] main/481/lua F> can't rename .snap.inprogress [001] 2020-11-15 18:13:49.931 [78887] main/481/lua F> can't rename .snap.inprogress It happened because on heavy loaded hosts may occure the situation when the previous snapshot was inprogress when the new snapshot came with the same file name *.snap.inprogress. It happens before the current snapshot completed and printed "dump completed" in log file. Also this file *.snap.inprogress was seen left on manual debug, when the test hanged before this patch. To resolve the test issue fiber sleep delay after it can be increased, but we want to save the issue reproducable. The current patch corrects the test to avoid of hang on box snapshot, to be able to continue testing after it failed. Fiber sleep was even decreased after adding fiber for box.snapshot to be able to reproduce the issue. Needed for #4385 --- Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-4385_hang_select_consistency Issue: https://github.com/tarantool/tarantool/issues/4385 test/vinyl/select_consistency.result | 51 +++++++++++++++++++------- test/vinyl/select_consistency.test.lua | 45 ++++++++++++++++------- test/vinyl/suite.ini | 3 +- 3 files changed, 71 insertions(+), 28 deletions(-) diff --git a/test/vinyl/select_consistency.result b/test/vinyl/select_consistency.result index f27d0ec68..cd99b84bf 100644 --- a/test/vinyl/select_consistency.result +++ b/test/vinyl/select_consistency.result @@ -4,6 +4,9 @@ test_run = require('test_run').new() fiber = require 'fiber' --- ... +log = require 'log' +--- +... math.randomseed(os.time()) --- ... @@ -52,6 +55,8 @@ function gen_update() end; --- ... +failed = false + function dml_loop() local insert = true while not stop do @@ -75,8 +80,13 @@ end; ... function snap_loop() while not stop do - box.snapshot() - fiber.sleep(0.1) + local ok, err = fiber.create(function() local ok, err = pcall(box.snapshot) return ok, err end) + if ok == false then + log.info("error: box.snapshot failed with error " .. err) + failed = true + break + end + fiber.sleep(0.01) end ch:put(true) end; @@ -97,10 +107,7 @@ _ = fiber.create(dml_loop); _ = fiber.create(snap_loop); --- ... -failed = {}; ---- -... -for i = 1, 10000 do +function run_iter() local val = math.random(MAX_VAL) box.begin() local res1 = s.index.i1:select({val}) @@ -117,25 +124,37 @@ for i = 1, 10000 do end end if not found then + log.error("error: equal not found for #res1 = " .. #res1 .. ", #res2 = " .. #res2) equal = false break end end - else - equal = false - end - if not equal then - table.insert(failed, {res1, res2}) end fiber.sleep(0) + return equal +end; +--- +... +for i = 1, 10000 do + if failed or not run_iter(i) then + log.error("error: failed on iteration " .. i) + failed = true + break + end end; --- ... stop = true; --- ... -for i = 1, ch:size() do - ch:get() +function check_get() + for i = 1, ch:size() do + if not test_run:wait_cond(function() return ch:get() ~= nil end) then + log.error("error: hanged on ch:get() on iteration " .. i) + return false + end + end + return true end; --- ... @@ -143,10 +162,14 @@ test_run:cmd("setopt delimiter ''"); --- - true ... -#failed == 0 or failed +test_run:wait_cond(function() return check_get() end) --- - true ... +failed +--- +- false +... s:drop() --- ... diff --git a/test/vinyl/select_consistency.test.lua b/test/vinyl/select_consistency.test.lua index 73b443e89..e5e01b741 100644 --- a/test/vinyl/select_consistency.test.lua +++ b/test/vinyl/select_consistency.test.lua @@ -1,6 +1,7 @@ test_run = require('test_run').new() fiber = require 'fiber' +log = require 'log' math.randomseed(os.time()) @@ -33,6 +34,8 @@ function gen_update() pcall(s.update, s, math.random(MAX_KEY), {{'+', 5, 1}}) end; +failed = false + function dml_loop() local insert = true while not stop do @@ -55,8 +58,13 @@ end; function snap_loop() while not stop do - box.snapshot() - fiber.sleep(0.1) + local ok, err = fiber.create(function() local ok, err = pcall(box.snapshot) return ok, err end) + if ok == false then + log.info("error: box.snapshot failed with error " .. err) + failed = true + break + end + fiber.sleep(0.01) end ch:put(true) end; @@ -68,9 +76,7 @@ _ = fiber.create(dml_loop); _ = fiber.create(dml_loop); _ = fiber.create(snap_loop); -failed = {}; - -for i = 1, 10000 do +function run_iter() local val = math.random(MAX_VAL) box.begin() local res1 = s.index.i1:select({val}) @@ -87,26 +93,39 @@ for i = 1, 10000 do end end if not found then + log.error("error: equal not found for #res1 = " .. #res1 .. ", #res2 = " .. #res2) equal = false break end end - else - equal = false - end - if not equal then - table.insert(failed, {res1, res2}) end fiber.sleep(0) + return equal +end; + +for i = 1, 10000 do + if failed or not run_iter(i) then + log.error("error: failed on iteration " .. i) + failed = true + break + end end; stop = true; -for i = 1, ch:size() do - ch:get() + +function check_get() + for i = 1, ch:size() do + if not test_run:wait_cond(function() return ch:get() ~= nil end) then + log.error("error: hanged on ch:get() on iteration " .. i) + return false + end + end + return true end; test_run:cmd("setopt delimiter ''"); -#failed == 0 or failed +test_run:wait_cond(function() return check_get() end) +failed s:drop() diff --git a/test/vinyl/suite.ini b/test/vinyl/suite.ini index 1fded2393..d55e7734c 100644 --- a/test/vinyl/suite.ini +++ b/test/vinyl/suite.ini @@ -23,7 +23,8 @@ fragile = { "issues": [ "gh-4346" ] }, "select_consistency.test.lua": { - "issues": [ "gh-4385" ] + "issues": [ "gh-4385" ], + "checksums": [ "817493231b4ce2ebb8aa295303e857df" ] }, "throttle.test.lua": { "issues": [ "gh-4168" ] -- 2.25.1
next reply other threads:[~2020-11-15 18:43 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-15 18:43 Alexander V. Tikhonov [this message] 2020-11-16 21:06 ` Nikita Pettik 2020-12-02 18:57 ` Alexander V. Tikhonov 2020-12-07 12:26 ` Nikita Pettik -- strict thread matches above, loose matches on Subject: below -- 2020-11-09 12:57 Alexander V. Tikhonov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=5b204cb5b32bb96c6a2cec2b06f399b360137250.1605465603.git.avtikhon@tarantool.org \ --to=avtikhon@tarantool.org \ --cc=korablev@tarantool.org \ --cc=kyukhin@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v1] test: fix hang of vinyl/select_consistency.* test' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox