Tarantool development patches archive
 help / color / mirror / Atom feed
From: "Ilya Kosarev" <i.kosarev@tarantool.org>
To: "Vladislav Shpilevoy" <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 0/2] fix replica iteration issue & stabilize quorum test
Date: Wed, 20 Nov 2019 04:41:11 +0300	[thread overview]
Message-ID: <1574214071.190794377@f120.i.mail.ru> (raw)
In-Reply-To: <1574213348.532614234@f512.i.mail.ru>

[-- Attachment #1: Type: text/plain, Size: 7921 bytes --]

Sorry, initially i didn't notice you ran the test on the branch.
Though my answer is still valid except the 1tst paragraph.


>Среда, 20 ноября 2019, 4:29 +03:00 от Ilya Kosarev <i.kosarev@tarantool.org>:
>
>Hi!
>
>Thanks for your review.
>
>Did u run tests on exactly this patchset or on the branch,
>https://github.com/tarantool/tarantool/tree/i.kosarev/gh-4586-fix-quorum-test
>which also contains  relay: fix join vclock obtainment in 
>relay_initial_join ?
>It is not yet checked in master (and might not get there, as far as
>Georgy has alternative fix as a part of sync replication), but
>is vital for the stability of the test.
>On my machine (Ubuntu 18.04.3 LTS) quorum test works perfectly on
>mentioned branch. I use next bash instruction to run it under 'load':
>l=0 ; while ./test-run.py -j20 `for r in {1..64} ; do echo quorum ; done` 2>/dev/null ; do l=$(($l+1)) ; echo ======== $l ============= ; done
>Anyway, I guess provided problems are not connected with join_vclock
>patch but are mac os specific, as far as i can't reproduce them locally.
>Guess we have some mac os machines, i will ask for access.
>It seems to me that wrong results problem is quite easy to handle.
>I have no idea for now how to handle provided segfault, however, i am
>sure it has nothing to do with the segfault mentioned in the issue, as
>far as it was caused by unsafe iteration of anon replicas. Other wrong
>results problems, mentioned in the ticket, are also handled in the
>patchset.
>Therefore i propose to close this ticket with the provided patchset
>although there are some other problems. Then i will open new issue with
>error info you provided and start to work on it as soon as i get remote
>access to some mac os machine.
>
>>Среда, 20 ноября 2019, 2:06 +03:00 от Vladislav Shpilevoy < v.shpilevoy@tarantool.org >:
>>
>>Hi! Thanks for the patch!
>>
>>The commits LGTM. But looks like there are more problems. I tried on
>>the branch:
>>
>>    python test-run.py replication/quorum. replication/quorum. replication/quorum. replication/quorum. replication/quorum. --conf memtx
>>
>>Got a crash one time, and wrong results other time. But overall
>>the test works more stable now, IMO.
>>
>>For the crash I have a core file. But it is not for gdb.
>>I can send it to you, or extract any info if you need.
>>
>>On the summary, I think we can't write 'Closes' in the
>>last commit yet. We need to improve these commits, or
>>add more fixes on the top.
>>
>>=====================================================================================
>>
>>Crash (looks really strange, may be an independent bug):
>>
>>(lldb) bt
>>* thread #1, stop reason = signal SIGSTOP
>>  * frame #0: 0x00007fff688be2c6 libsystem_kernel.dylib`__pthread_kill + 10
>>    frame #1: 0x00007fff68973bf1 libsystem_pthread.dylib`pthread_kill + 284
>>    frame #2: 0x00007fff688286a6 libsystem_c.dylib`abort + 127
>>    frame #3: 0x00000001026fd136 tarantool`sig_fatal_cb(signo=11, siginfo=0x000000010557fa38, context=0x000000010557faa0) at main.cc:300:2
>>    frame #4: 0x00007fff68968b5d libsystem_platform.dylib`_sigtramp + 29
>>    frame #5: 0x00007fff6896f138 libsystem_pthread.dylib`pthread_mutex_lock + 1
>>    frame #6: 0x0000000102b0600a tarantool`etp_submit(user=0x00007fc8500072c0, req=0x00007fc851901340) at etp.c:533:7
>>    frame #7: 0x0000000102b05eca tarantool`eio_submit(req=0x00007fc851901340) at eio.c:482:3
>>    frame #8: 0x000000010288d004 tarantool`coio_task_execute(task=0x00007fc851901340, timeout=15768000000) at coio_task.c:245:2
>>    frame #9: 0x000000010288da2b tarantool`coio_getaddrinfo(host="localhost", port="56816", hints=0x000000010557fd98, res=0x000000010557fd90, timeout=15768000000) at coio_task.c:412:6
>>    frame #10: 0x000000010285d787 tarantool`lbox_socket_getaddrinfo(L=0x00000001051ae590) at socket.c:814:12
>>    frame #11: 0x00000001028b932d tarantool`lj_BC_FUNCC + 68
>>    frame #12: 0x00000001028e79ae tarantool`lua_pcall(L=0x00000001051ae590, nargs=3, nresults=-1, errfunc=0) at lj_api.c:1139:12
>>    frame #13: 0x000000010285af73 tarantool`luaT_call(L=0x00000001051ae590, nargs=3, nreturns=-1) at utils.c:1036:6
>>    frame #14: 0x00000001028532c6 tarantool`lua_fiber_run_f(ap=0x00000001054003e8) at fiber.c:433:11
>>    frame #15: 0x00000001026fc91a tarantool`fiber_cxx_invoke(f=(tarantool`lua_fiber_run_f at fiber.c:427), ap=0x00000001054003e8)(__va_list_tag*), __va_list_tag*) at fiber.h:742:10
>>    frame #16: 0x000000010287765b tarantool`fiber_loop(data=0x0000000000000000) at fiber.c:737:18
>>    frame #17: 0x0000000102b0c787 tarantool`coro_init at coro.c:110:3
>>(lldb) 
>>
>>=====================================================================================
>>
>>Wrong results:
>>
>>[005] replication/quorum.test.lua                     memtx           [ pass ]
>>[002] replication/quorum.test.lua                     memtx           [ fail ]
>>[002] 
>>[002] Test failed! Result content mismatch:
>>[002] --- replication/quorum.result	Tue Nov 19 23:57:26 2019
>>[002] +++ replication/quorum.reject	Wed Nov 20 00:00:20 2019
>>[002] @@ -42,7 +42,8 @@
>>[002]  ...
>>[002]  box.space.test:replace{100} -- error
>>[002]  ---
>>[002] -- error: Can't modify data because this instance is in read-only mode.
>>[002] +- error: '[string "return box.space.test:replace{100} -- error "]:1: attempt to index
>>[002] +    field ''test'' (a nil value)'
>>[002]  ...
>>[002]  box.cfg{replication={}}
>>[002]  ---
>>[002] @@ -66,7 +67,8 @@
>>[002]  ...
>>[002]  box.space.test:replace{100} -- error
>>[002]  ---
>>[002] -- error: Can't modify data because this instance is in read-only mode.
>>[002] +- error: '[string "return box.space.test:replace{100} -- error "]:1: attempt to index
>>[002] +    field ''test'' (a nil value)'
>>[002]  ...
>>[002]  box.cfg{replication_connect_quorum = 2}
>>[002]  ---
>>[002] @@ -97,7 +99,8 @@
>>[002]  ...
>>[002]  box.space.test:replace{100} -- error
>>[002]  ---
>>[002] -- error: Can't modify data because this instance is in read-only mode.
>>[002] +- error: '[string "return box.space.test:replace{100} -- error "]:1: attempt to index
>>[002] +    field ''test'' (a nil value)'
>>[002]  ...
>>[002]  test_run:cmd('start server quorum1 with args="0.1 0.5"')
>>[002]  ---
>>[002] @@ -151,10 +154,13 @@
>>[002]  ...
>>[002]  test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20)
>>[002]  ---
>>[002] -- true
>>[002] +- error: '[string "return test_run:wait_cond(function() return b..."]:1: attempt to
>>[002] +    index field ''test'' (a nil value)'
>>[002]  ...
>>[002]  for i = 1, 100 do box.space.test:insert{i} end
>>[002]  ---
>>[002] +- error: '[string "for i = 1, 100 do box.space.test:insert{i} end "]:1: attempt to
>>[002] +    index field ''test'' (a nil value)'
>>[002]  ...
>>[002]  fiber = require('fiber')
>>[002]  ---
>>[002] @@ -172,7 +178,8 @@
>>[002]  ...
>>[002]  test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>>[002]  ---
>>[002] -- true
>>[002] +- error: '[string "return test_run:wait_cond(function() return b..."]:1: attempt to
>>[002] +    index field ''test'' (a nil value)'
>>[002]  ...
>>[002]  -- Rebootstrap one node of the cluster and check that others follow.
>>[002]  -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay
>>[002] @@ -203,7 +210,8 @@
>>[002]  test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
>>[002]  test_run:wait_cond(function() return box.space.test:count() == 100 end, 20)
>>[002]  ---
>>[002] -- true
>>[002] +- error: '[string "return test_run:wait_cond(function() return b..."]:1: attempt to
>>[002] +    index field ''test'' (a nil value)'
>>[002]  ...
>>[002]  -- The rebootstrapped replica will be assigned id = 4,
>>[002]  -- because ids 1..3 are busy.
>>[002] 
>
>
>-- 
>Ilya Kosarev


-- 
Ilya Kosarev

[-- Attachment #2: Type: text/html, Size: 9799 bytes --]

  reply	other threads:[~2019-11-20  1:41 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-18 13:19 Ilya Kosarev
2019-11-18 13:19 ` [Tarantool-patches] [PATCH 1/2] replication: make anon replicas iteration safe Ilya Kosarev
2019-11-18 13:19 ` [Tarantool-patches] [PATCH 2/2] test: stabilize quorum test conditions Ilya Kosarev
2019-11-19 23:13 ` [Tarantool-patches] [PATCH 0/2] fix replica iteration issue & stabilize quorum test Vladislav Shpilevoy
2019-11-20  1:29   ` Ilya Kosarev
2019-11-20  1:41     ` Ilya Kosarev [this message]
2019-11-20 22:18     ` Vladislav Shpilevoy
2019-11-20 22:48       ` Ilya Kosarev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1574214071.190794377@f120.i.mail.ru \
    --to=i.kosarev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 0/2] fix replica iteration issue & stabilize quorum test' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox