Tarantool development patches archive
 help / color / mirror / Atom feed
From: Sergey Petrenko <sergepetrenko@tarantool.org>
To: tarantool-patches@freelists.org
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>,
	Kirill Yukhin <kyukhin@tarantool.org>
Subject: Re: [tarantool-patches] [PATCH] replication: fix a failing assert in replica_on_applier_disconnect()
Date: Mon, 6 Aug 2018 17:14:05 +0300	[thread overview]
Message-ID: <81999702-C603-423E-92C9-199CE605FED4@tarantool.org> (raw)
In-Reply-To: <20180803155745.tmndjr52n6igtdno@tarantool.org>

Hi!

> 3 авг. 2018 г., в 18:57, Kirill Yukhin <kyukhin@tarantool.org> написал(а):
> 
> Hello Serge,
> On 03 авг 08:59, Serge Petrenko wrote:
>> One possible case when two applier errors happen one after another
>> wasn't handled in replica_on_applier_disconnect(), which lead to
>> occasional test failures and crashes. Handle this case.
>> 
>> Part of #3510
>> ---
>> This patch fixes an assertion fail, submitted by @locker in issue comments.
>> I wasn't able to reproduce 2 failures reported in the issue itself, and asked
>> for comments, but got no answer. I also couldn't fix the latter 2
>> failures just by looking at code.
>> 
>> https://github.com/tarantool/tarantool/tree/sergepetrenko/gh-3510-replication-asserts-fail
>> https://github.com/tarantool/tarantool/issues/3510
> Could you pls prepare a regression test as well?

Added a test. It fails with assertion(0) before my patch and passes with my patch.
Here’s new diff:

 src/box/replication.cc         |  4 ++++
 test/replication/er_load.lua   | 23 +++++++++++++++++++++++
 test/replication/er_load1.lua  |  1 +
 test/replication/er_load2.lua  |  1 +
 test/replication/misc.result   | 39 +++++++++++++++++++++++++++++++++++++++
 test/replication/misc.test.lua | 12 ++++++++++++
 6 files changed, 80 insertions(+)
 create mode 100644 test/replication/er_load.lua
 create mode 120000 test/replication/er_load1.lua
 create mode 120000 test/replication/er_load2.lua

diff --git a/src/box/replication.cc b/src/box/replication.cc
index 26bbbe32a..0efbd7c0e 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -350,6 +350,10 @@ replica_on_applier_disconnect(struct replica *replica)
 		assert(replicaset.applier.connected > 0);
 		replicaset.applier.connected--;
 		break;
+	case APPLIER_LOADING:
+		assert(replicaset.applier.loading > 0);
+		replicaset.applier.loading--;
+		break;
 	case APPLIER_DISCONNECTED:
 		break;
 	default:
diff --git a/test/replication/er_load.lua b/test/replication/er_load.lua
new file mode 100644
index 000000000..0db8c9cfa
--- /dev/null
+++ b/test/replication/er_load.lua
@@ -0,0 +1,23 @@
+#!/usr/bin/env tarantool
+
+-- get instance id from filename (er_load1.lua => 1)
+local INSTANCE_ID = string.match(arg[0], '%d')
+
+local SOCKET_DIR =  require('fio').cwd()
+local function instance_uri(instance_id)
+    return SOCKET_DIR..'/er_load'..instance_id..'.sock'
+end
+
+require('console').listen(os.getenv('ADMIN'))
+
+box.cfg{
+    listen = instance_uri(INSTANCE_ID);
+    replication = {
+	instance_uri(INSTANCE_ID),
+	'noone:pass@'..instance_uri(INSTANCE_ID % 2 + 1)
+    }
+}
+
+box.once("leader", function()
+    box.schema.user.grant('guest', 'replication')
+end)
diff --git a/test/replication/er_load1.lua b/test/replication/er_load1.lua
new file mode 120000
index 000000000..18f7ffa5a
--- /dev/null
+++ b/test/replication/er_load1.lua
@@ -0,0 +1 @@
+er_load.lua
\ No newline at end of file
diff --git a/test/replication/er_load2.lua b/test/replication/er_load2.lua
new file mode 120000
index 000000000..18f7ffa5a
--- /dev/null
+++ b/test/replication/er_load2.lua
@@ -0,0 +1 @@
+er_load.lua
\ No newline at end of file
diff --git a/test/replication/misc.result b/test/replication/misc.result
index ff0dbf549..35b51085f 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -208,3 +208,42 @@ test_run:drop_cluster(SERVERS)
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
+-- gh-3510 assertion failure in replica_on_applier_disconnect()
+test_run:cmd('create server er_load1 with script="replication/er_load1.lua"')
+---
+- true
+...
+test_run:cmd('create server er_load2 with script="replication/er_load2.lua"')
+---
+- true
+...
+test_run:cmd('start server er_load1 with wait=False, wait_load=False')
+---
+- true
+...
+test_run:cmd('start server er_load2 with wait=False, wait_load=False')
+---
+- true
+...
+require('fiber').sleep(0.5)
+---
+...
+test_run:cmd('stop server er_load1')
+---
+- true
+...
+require('fiber').sleep(1)
+---
+...
+test_run:cmd('stop server er_load2')
+---
+- true
+...
+test_run:cmd('cleanup server er_load1')
+---
+- true
+...
+test_run:cmd('cleanup server er_load2')
+---
+- true
+...
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index c05e52165..27c1a4821 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -81,3 +81,15 @@ test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 
 box.schema.user.revoke('guest', 'replication')
+
+-- gh-3510 assertion failure in replica_on_applier_disconnect()
+test_run:cmd('create server er_load1 with script="replication/er_load1.lua"')
+test_run:cmd('create server er_load2 with script="replication/er_load2.lua"')
+test_run:cmd('start server er_load1 with wait=False, wait_load=False')
+test_run:cmd('start server er_load2 with wait=False, wait_load=False')
+require('fiber').sleep(0.5)
+test_run:cmd('stop server er_load1')
+require('fiber').sleep(1)
+test_run:cmd('stop server er_load2')
+test_run:cmd('cleanup server er_load1')
+test_run:cmd('cleanup server er_load2')
-- 
2.15.2 (Apple Git-101.1)

  reply	other threads:[~2018-08-06 14:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-03  5:59 Serge Petrenko
2018-08-03 15:43 ` Vladimir Davydov
2018-08-03 15:57 ` [tarantool-patches] " Kirill Yukhin
2018-08-06 14:14   ` Sergey Petrenko [this message]
2018-08-07 16:50     ` Vladimir Davydov
2018-08-08 10:10       ` Sergey Petrenko
2018-08-08 10:58         ` Vladimir Davydov
2018-08-08 15:19           ` Sergey Petrenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=81999702-C603-423E-92C9-199CE605FED4@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=kyukhin@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --cc=vdavydov.dev@gmail.com \
    --subject='Re: [tarantool-patches] [PATCH] replication: fix a failing assert in replica_on_applier_disconnect()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox