Tarantool development patches archive
 help / color / mirror / Atom feed
From: "Ilya Kosarev" <i.kosarev@tarantool.org>
To: "Vladislav Shpilevoy" <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 0/2] fix replica iteration issue & stabilize quorum test
Date: Wed, 20 Nov 2019 04:29:08 +0300	[thread overview]
Message-ID: <1574213348.532614234@f512.i.mail.ru> (raw)
In-Reply-To: <4a5e0fa4-02bd-8022-a5a9-32177392b2e8@tarantool.org>

[-- Attachment #1: Type: text/plain, Size: 7527 bytes --]

Hi!

Thanks for your review.

Did u run tests on exactly this patchset or on the branch,
https://github.com/tarantool/tarantool/tree/i.kosarev/gh-4586-fix-quorum-test
which also contains  relay: fix join vclock obtainment in 
relay_initial_join ?
It is not yet checked in master (and might not get there, as far as
Georgy has alternative fix as a part of sync replication), but
is vital for the stability of the test.
On my machine (Ubuntu 18.04.3 LTS) quorum test works perfectly on
mentioned branch. I use next bash instruction to run it under 'load':
l=0 ; while ./test-run.py -j20 `for r in {1..64} ; do echo quorum ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done
Anyway, I guess provided problems are not connected with join_vclock
patch but are mac os specific, as far as i can't reproduce them locally.
Guess we have some mac os machines, i will ask for access.
It seems to me that wrong results problem is quite easy to handle.
I have no idea for now how to handle provided segfault, however, i am
sure it has nothing to do with the segfault mentioned in the issue, as
far as it was caused by unsafe iteration of anon replicas. Other wrong
results problems, mentioned in the ticket, are also handled in the
patchset.
Therefore i propose to close this ticket with the provided patchset
although there are some other problems. Then i will open new issue with
error info you provided and start to work on it as soon as i get remote
access to some mac os machine.

>Среда, 20 ноября 2019, 2:06 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>:
>
>Hi! Thanks for the patch!
>
>The commits LGTM. But looks like there are more problems. I tried on
>the branch:
>
>    python test-run.py replication/quorum. replication/quorum. replication/quorum. replication/quorum. replication/quorum. --conf memtx
>
>Got a crash one time, and wrong results other time. But overall
>the test works more stable now, IMO.
>
>For the crash I have a core file. But it is not for gdb.
>I can send it to you, or extract any info if you need.
>
>On the summary, I think we can't write 'Closes' in the
>last commit yet. We need to improve these commits, or
>add more fixes on the top.
>
>=====================================================================================
>
>Crash (looks really strange, may be an independent bug):
>
>(lldb) bt
>* thread #1, stop reason = signal SIGSTOP
>  * frame #0: 0x00007fff688be2c6 libsystem_kernel.dylib`__pthread_kill + 10
>    frame #1: 0x00007fff68973bf1 libsystem_pthread.dylib`pthread_kill + 284
>    frame #2: 0x00007fff688286a6 libsystem_c.dylib`abort + 127
>    frame #3: 0x00000001026fd136 tarantool`sig_fatal_cb(signo=11, siginfo=0x000000010557fa38, context=0x000000010557faa0) at main.cc:300:2
>    frame #4: 0x00007fff68968b5d libsystem_platform.dylib`_sigtramp + 29
>    frame #5: 0x00007fff6896f138 libsystem_pthread.dylib`pthread_mutex_lock + 1
>    frame #6: 0x0000000102b0600a tarantool`etp_submit(user=0x00007fc8500072c0, req=0x00007fc851901340) at etp.c:533:7
>    frame #7: 0x0000000102b05eca tarantool`eio_submit(req=0x00007fc851901340) at eio.c:482:3
>    frame #8: 0x000000010288d004 tarantool`coio_task_execute(task=0x00007fc851901340, timeout=15768000000) at coio_task.c:245:2
>    frame #9: 0x000000010288da2b tarantool`coio_getaddrinfo(host="localhost", port="56816", hints=0x000000010557fd98, res=0x000000010557fd90, timeout=15768000000) at coio_task.c:412:6
>    frame #10: 0x000000010285d787 tarantool`lbox_socket_getaddrinfo(L=0x00000001051ae590) at socket.c:814:12
>    frame #11: 0x00000001028b932d tarantool`lj_BC_FUNCC + 68
>    frame #12: 0x00000001028e79ae tarantool`lua_pcall(L=0x00000001051ae590, nargs=3, nresults=-1, errfunc=0) at lj_api.c:1139:12
>    frame #13: 0x000000010285af73 tarantool`luaT_call(L=0x00000001051ae590, nargs=3, nreturns=-1) at utils.c:1036:6
>    frame #14: 0x00000001028532c6 tarantool`lua_fiber_run_f(ap=0x00000001054003e8) at fiber.c:433:11
>    frame #15: 0x00000001026fc91a tarantool`fiber_cxx_invoke(f=(tarantool`lua_fiber_run_f at fiber.c:427), ap=0x00000001054003e8)(__va_list_tag*), __va_list_tag*) at fiber.h:742:10
>    frame #16: 0x000000010287765b tarantool`fiber_loop(data=0x0000000000000000) at fiber.c:737:18
>    frame #17: 0x0000000102b0c787 tarantool`coro_init at coro.c:110:3
>(lldb) 
>
>=====================================================================================
>
>Wrong results:
>
>[005] replication/quorum.test.lua                     memtx           [ pass ]
>[002] replication/quorum.test.lua                     memtx           [ fail ]
>[002] 
>[002] Test failed! Result content mismatch:
>[002] --- replication/quorum.result	Tue Nov 19 23:57:26 2019
>[002] +++ replication/quorum.reject	Wed Nov 20 00:00:20 2019
>[002] @@ -42,7 +42,8 @@
>[002]  ...
>[002]  box.space.test:replace{100} -- error
>[002]  ---
>[002] -- error: Can't modify data because this instance is in read-only mode.
>[002] +- error: '[string "return box.space.test:replace{100} -- error "]:1: attempt to index
>[002] +    field ''test'' (a nil value)'
>[002]  ...
>[002]  box.cfg{replication={}}
>[002]  ---
>[002] @@ -66,7 +67,8 @@
>[002]  ...
>[002]  box.space.test:replace{100} -- error
>[002]  ---
>[002] -- error: Can't modify data because this instance is in read-only mode.
>[002] +- error: '[string "return box.space.test:replace{100} -- error "]:1: attempt to index
>[002] +    field ''test'' (a nil value)'
>[002]  ...
>[002]  box.cfg{replication_connect_quorum = 2}
>[002]  ---
>[002] @@ -97,7 +99,8 @@
>[002]  ...
>[002]  box.space.test:replace{100} -- error
>[002]  ---
>[002] -- error: Can't modify data because this instance is in read-only mode.
>[002] +- error: '[string "return box.space.test:replace{100} -- error "]:1: attempt to index
>[002] +    field ''test'' (a nil value)'
>[002]  ...
>[002]  test_run:cmd('start server quorum1 with args="0.1 0.5"')
>[002]  ---
>[002] @@ -151,10 +154,13 @@
>[002]  ...
>[002]  test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
>[002]  ---
>[002] -- true
>[002] +- error: '[string "return test_run:wait_cond(function() return b..."]:1: attempt to
>[002] +    index field ''test'' (a nil value)'
>[002]  ...
>[002]  for i = 1, 100 do box.space.test:insert{i} end
>[002]  ---
>[002] +- error: '[string "for i = 1, 100 do box.space.test:insert{i} end "]:1: attempt to
>[002] +    index field ''test'' (a nil value)'
>[002]  ...
>[002]  fiber = require('fiber')
>[002]  ---
>[002] @@ -172,7 +178,8 @@
>[002]  ...
>[002]  test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>[002]  ---
>[002] -- true
>[002] +- error: '[string "return test_run:wait_cond(function() return b..."]:1: attempt to
>[002] +    index field ''test'' (a nil value)'
>[002]  ...
>[002]  -- Rebootstrap one node of the cluster and check that others follow.
>[002]  -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
>[002] @@ -203,7 +210,8 @@
>[002]  test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
>[002]  test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>[002]  ---
>[002] -- true
>[002] +- error: '[string "return test_run:wait_cond(function() return b..."]:1: attempt to
>[002] +    index field ''test'' (a nil value)'
>[002]  ...
>[002]  -- The rebootstrapped replica will be assigned id = 4,
>[002]  -- because ids 1..3 are busy.
>[002] 


-- 
Ilya Kosarev

[-- Attachment #2: Type: text/html, Size: 8858 bytes --]

  reply	other threads:[~2019-11-20  1:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-18 13:19 Ilya Kosarev
2019-11-18 13:19 ` [Tarantool-patches] [PATCH 1/2] replication: make anon replicas iteration safe Ilya Kosarev
2019-11-18 13:19 ` [Tarantool-patches] [PATCH 2/2] test: stabilize quorum test conditions Ilya Kosarev
2019-11-19 23:13 ` [Tarantool-patches] [PATCH 0/2] fix replica iteration issue & stabilize quorum test Vladislav Shpilevoy
2019-11-20  1:29   ` Ilya Kosarev [this message]
2019-11-20  1:41     ` Ilya Kosarev
2019-11-20 22:18     ` Vladislav Shpilevoy
2019-11-20 22:48       ` Ilya Kosarev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1574213348.532614234@f512.i.mail.ru \
    --to=i.kosarev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 0/2] fix replica iteration issue & stabilize quorum test' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox