Tarantool development patches archive
 help / color / mirror / Atom feed
From: "Serge Petrenko" <sergepetrenko@tarantool.org>
To: "Konstantin Osipov" <kostja.osipov@gmail.com>
Cc: kirichenkoga@gmail.com, tarantool-patches@dev.tarantool.org,
	v.shpilevoy@tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it
Date: Wed, 26 Feb 2020 18:58:17 +0300	[thread overview]
Message-ID: <1582732697.613312845@f380.i.mail.ru> (raw)
In-Reply-To: <20200226115815.GE15433@atlas>

[-- Attachment #1: Type: text/plain, Size: 7435 bytes --]


  
>Среда, 26 февраля 2020, 14:58 +03:00 от Konstantin Osipov <kostja.osipov@gmail.com>:
> 
>* Serge Petrenko < sergepetrenko@tarantool.org > [20/02/26 14:22]:
>> I don’t think I can. The test that comes with an issue is a stress test,
>> relying on running it with multiple workers simultaneously.
>> It reproduces the problem when ran with 4 workers on one of my PCs,
>> and with 20 workers on the other.
>> I think we don’t have the appropriate testing infrastructure to run the same
>> test with multiple workers at the same time, and I couldn’t come up with a
>> single test which would reproduce the same problem.
>Is there a place in which you can inject a sleep to make the
>problem much easier to reproduce?
>
>What about injecting a sleep in wal code on replica, the place
>which increments local replicaset vclock ?
 
Thanks for the suggestion! Haven’t thought about it for some reason.
I made a test. The diff’s below.
>
>Then you will be much more likely to receive a record from the
>peer before you incremented the record vclock locally, and the bug
>will be reproducible with a single master.
>
>--
>Konstantin Osipov, Moscow, Russia
 
diff --git a/src/box/wal.c b/src/box/wal.c
index 27bff662a..35ba7b072 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -278,8 +278,13 @@ tx_schedule_commit(struct cmsg *msg)
  /* Closes the input valve. */
  stailq_concat(&writer->rollback, &batch->rollback);
  }
+
+ ERROR_INJECT(ERRINJ_REPLICASET_VCLOCK_UPDATE, { goto skip_update; });
  /* Update the tx vclock to the latest written by wal. */
  vclock_copy(&replicaset.vclock, &batch->vclock);
+#ifndef NDEBUG
+skip_update:
+#endif
  tx_schedule_queue(&batch->commit);
  mempool_free(&writer->msg_pool, container_of(msg, struct wal_msg, base));
}
diff --git a/src/lib/core/errinj.h b/src/lib/core/errinj.h
index ed0cba903..58fe158fd 100644
--- a/src/lib/core/errinj.h
+++ b/src/lib/core/errinj.h
@@ -136,7 +136,8 @@ struct errinj {
  _(ERRINJ_SWIM_FD_ONLY, ERRINJ_BOOL, {.bparam = false}) \
  _(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT, {.iparam = 0}) \
  _(ERRINJ_FIBER_MADVISE, ERRINJ_BOOL, {.bparam = false}) \
- _(ERRINJ_FIBER_MPROTECT, ERRINJ_INT, {.iparam = -1})
+ _(ERRINJ_FIBER_MPROTECT, ERRINJ_INT, {.iparam = -1}) \
+ _(ERRINJ_REPLICASET_VCLOCK_UPDATE, ERRINJ_BOOL, {.bparam = false}) \
 
ENUM0(errinj_id, ERRINJ_LIST);
extern struct errinj errinjs[];
diff --git a/test/box/errinj.result b/test/box/errinj.result
index daa27ed24..eb0905238 100644
--- a/test/box/errinj.result
+++ b/test/box/errinj.result
@@ -64,6 +64,7 @@ evals
   - ERRINJ_RELAY_REPORT_INTERVAL: 0
   - ERRINJ_RELAY_SEND_DELAY: false
   - ERRINJ_RELAY_TIMEOUT: 0
+  - ERRINJ_REPLICASET_VCLOCK_UPDATE: false
   - ERRINJ_REPLICA_JOIN_DELAY: false
   - ERRINJ_SIO_READ_MAX: -1
   - ERRINJ_SNAP_COMMIT_DELAY: false
diff --git a/test/replication/gh-4739-vclock-assert.result b/test/replication/gh-4739-vclock-assert.result
new file mode 100644
index 000000000..7dc2f7118
--- /dev/null
+++ b/test/replication/gh-4739-vclock-assert.result
@@ -0,0 +1,82 @@
+-- test-run result file version 2
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+
+SERVERS = {'rebootstrap1', 'rebootstrap2'}
+ | ---
+ | ...
+test_run:create_cluster(SERVERS, "replication")
+ | ---
+ | ...
+test_run:wait_fullmesh(SERVERS)
+ | ---
+ | ...
+
+test_run:cmd('switch rebootstrap1')
+ | ---
+ | - true
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+-- Stop updating replicaset vclock to simulate a situation, when
+-- a row is already relayed to the remote master, but the local
+-- vclock update hasn't happened yet.
+box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', true)
+ | ---
+ | - ok
+ | ...
+lsn = box.info.lsn
+ | ---
+ | ...
+box.space._schema:replace{'something'}
+ | ---
+ | - ['something']
+ | ...
+-- Vclock isn't updated.
+box.info.lsn == lsn
+ | ---
+ | - true
+ | ...
+
+-- Wait until the remote instance gets the row.
+while test_run:get_vclock('rebootstrap2')[box.info.id] == lsn do\
+    fiber.sleep(0.01)\
+end
+ | ---
+ | ...
+
+-- Restart the remote instance. This will make the first instance
+-- resubscribe without entering orphan mode.
+test_run:cmd('restart server rebootstrap2')
+ | ---
+ | - true
+ | ...
+test_run:cmd('switch rebootstrap1')
+ | ---
+ | - true
+ | ...
+-- Wait until resubscribe is sent
+fiber.sleep(2 * box.cfg.replication_timeout)
+ | ---
+ | ...
+box.info.replication[2].upstream.status
+ | ---
+ | - sync
+ | ...
+
+box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', false)
+ | ---
+ | - ok
+ | ...
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+test_run:drop_cluster(SERVERS)
+ | ---
+ | ...
diff --git a/test/replication/gh-4739-vclock-assert.test.lua b/test/replication/gh-4739-vclock-assert.test.lua
new file mode 100644
index 000000000..26dc781e2
--- /dev/null
+++ b/test/replication/gh-4739-vclock-assert.test.lua
@@ -0,0 +1,34 @@
+env = require('test_run')
+test_run = env.new()
+
+SERVERS = {'rebootstrap1', 'rebootstrap2'}
+test_run:create_cluster(SERVERS, "replication")
+test_run:wait_fullmesh(SERVERS)
+
+test_run:cmd('switch rebootstrap1')
+fiber = require('fiber')
+-- Stop updating replicaset vclock to simulate a situation, when
+-- a row is already relayed to the remote master, but the local
+-- vclock update hasn't happened yet.
+box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', true)
+lsn = box.info.lsn
+box.space._schema:replace{'something'}
+-- Vclock isn't updated.
+box.info.lsn == lsn
+
+-- Wait until the remote instance gets the row.
+while test_run:get_vclock('rebootstrap2')[box.info.id] == lsn do\
+    fiber.sleep(0.01)\
+end
+
+-- Restart the remote instance. This will make the first instance
+-- resubscribe without entering orphan mode.
+test_run:cmd('restart server rebootstrap2')
+test_run:cmd('switch rebootstrap1')
+-- Wait until resubscribe is sent
+fiber.sleep(2 * box.cfg.replication_timeout)
+box.info.replication[2].upstream.status
+
+box.error.injection.set('ERRINJ_REPLICASET_VCLOCK_UPDATE', false)
+test_run:cmd('switch default')
+test_run:drop_cluster(SERVERS)
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 429c64df3..90fd53ca6 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -15,6 +15,7 @@
     "gh-4402-info-errno.test.lua": {},
     "gh-4605-empty-password.test.lua": {},
     "gh-4606-admin-creds.test.lua": {},
+    "gh-4739-vclock-assert.test.lua": {},
     "*": {
         "memtx": {"engine": "memtx"},
         "vinyl": {"engine": "vinyl"}
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index ed1de3140..b4e09744a 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -3,7 +3,7 @@ core = tarantool
script =  master.lua
description = tarantool/box, replication
disabled = consistent.test.lua
-release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua
+release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua
config = suite.cfg
lua_libs = lua/fast_replica.lua lua/rlimit.lua
use_unix_sockets = True
 
--
Serge Petrenko

[-- Attachment #2: Type: text/html, Size: 9577 bytes --]

  reply	other threads:[~2020-02-26 15:58 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-26 10:00 [Tarantool-patches] [PATCH v4 0/4] replication: fix applying of rows originating from local instance sergepetrenko
2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 1/4] box: expose box_is_orphan method sergepetrenko
2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 2/4] wal: warn when trying to write a record with a broken lsn sergepetrenko
2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 3/4] replication: implement an instance id filter for relay sergepetrenko
2020-02-26 10:18   ` Konstantin Osipov
2020-02-26 11:16     ` Serge Petrenko
2020-02-26 23:54   ` Vladislav Shpilevoy
2020-02-27  6:48     ` Konstantin Osipov
2020-02-27 13:15     ` Serge Petrenko
2020-02-27 23:33       ` Vladislav Shpilevoy
2020-02-26 10:00 ` [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it sergepetrenko
2020-02-26 10:23   ` Konstantin Osipov
2020-02-26 11:21     ` Serge Petrenko
2020-02-26 11:58       ` Konstantin Osipov
2020-02-26 15:58         ` Serge Petrenko [this message]
2020-02-26 16:40           ` Konstantin Osipov
2020-02-26 23:54   ` Vladislav Shpilevoy
2020-02-27  6:52     ` Konstantin Osipov
2020-02-27 14:13     ` Serge Petrenko
2020-02-27 21:17       ` Serge Petrenko
2020-02-27 23:22         ` Vladislav Shpilevoy
2020-02-28  8:03           ` Serge Petrenko
2020-02-26 23:54 ` [Tarantool-patches] [PATCH v4 0/4] replication: fix applying of rows originating from local instance Vladislav Shpilevoy
2020-02-27 21:24   ` Serge Petrenko
2020-02-27 23:24     ` Vladislav Shpilevoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1582732697.613312845@f380.i.mail.ru \
    --to=sergepetrenko@tarantool.org \
    --cc=kirichenkoga@gmail.com \
    --cc=kostja.osipov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v4 4/4] replication: do not relay rows coming from a remote instance back to it' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox