* [Tarantool-patches] [PATCH v27 0/3] qsync: implement packet filtering (part 1)
@ 2021-12-30 20:23 Cyrill Gorcunov via Tarantool-patches
  2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 1/3] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-12-30 20:23 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy
Guys, please take a look once time permit, any comments are highly appreciated!
v22 (by SergeP):
 - use limbo emptiness test _after_ owner_id test
 - drop redundant assert in limbo commit/rollback
   since we're unlocking a latch anyway where own
   assertion present
 - in test: drop excessive wait_cond and setup wal
   delay earlier
v23 (by SergeP):
 - fix problem with owner test in case of recovery journal
 - update test
v27:
 - simplify the code and cover the case targeting journal
   write plus terms access race only
 - i've spent almost a few months trying to use fine grained
   locking scheme for limbo but it end up in very complex code
   so eventually I dropped this idea (here why we've v27 after v23)
branch gorcunov/gh-6036-rollback-confirm-27-notest
issue https://github.com/tarantool/tarantool/issues/6036
previous series https://lists.tarantool.org/tarantool-patches/20211014215622.49732-1-gorcunov@gmail.com/#r
Cyrill Gorcunov (3):
  latch: add latch_is_locked helper
  qsync: order access to the limbo terms
  test: add gh-6036-qsync-order test
 src/box/applier.cc                            |  12 +-
 src/box/lua/info.c                            |   4 +-
 src/box/txn_limbo.c                           |  18 +-
 src/box/txn_limbo.h                           |  52 ++++-
 src/lib/core/latch.h                          |  11 +
 test/replication/gh-6036-qsync-order.result   | 189 ++++++++++++++++++
 test/replication/gh-6036-qsync-order.test.lua |  94 +++++++++
 test/replication/suite.cfg                    |   1 +
 test/replication/suite.ini                    |   2 +-
 9 files changed, 371 insertions(+), 12 deletions(-)
 create mode 100644 test/replication/gh-6036-qsync-order.result
 create mode 100644 test/replication/gh-6036-qsync-order.test.lua
base-commit: e2d6c260516c30d33c816b7a6c6c1696e55a39b2
-- 
2.31.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [Tarantool-patches] [PATCH v27 1/3] latch: add latch_is_locked helper
  2021-12-30 20:23 [Tarantool-patches] [PATCH v27 0/3] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
@ 2021-12-30 20:23 ` Cyrill Gorcunov via Tarantool-patches
  2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
  2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 3/3] test: add gh-6036-qsync-order test Cyrill Gorcunov via Tarantool-patches
  2 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-12-30 20:23 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy
To test if latch is locked.
Part-of #6036
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/lib/core/latch.h | 11 +++++++++++
 1 file changed, 11 insertions(+)
diff --git a/src/lib/core/latch.h b/src/lib/core/latch.h
index 49c59cf63..0aaa8b634 100644
--- a/src/lib/core/latch.h
+++ b/src/lib/core/latch.h
@@ -95,6 +95,17 @@ latch_owner(struct latch *l)
 	return l->owner;
 }
 
+/**
+ * Return true if the latch is locked.
+ *
+ * @param l - latch to be tested.
+ */
+static inline bool
+latch_is_locked(const struct latch *l)
+{
+	return l->owner != NULL;
+}
+
 /**
  * Lock a latch. If the latch is already locked by another fiber,
  * waits for timeout.
-- 
2.31.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms
  2021-12-30 20:23 [Tarantool-patches] [PATCH v27 0/3] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
  2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 1/3] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
@ 2021-12-30 20:23 ` Cyrill Gorcunov via Tarantool-patches
  2022-01-10 14:28   ` Serge Petrenko via Tarantool-patches
  2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 3/3] test: add gh-6036-qsync-order test Cyrill Gorcunov via Tarantool-patches
  2 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-12-30 20:23 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy
Limbo terms tracking is shared between appliers and when
one of appliers is waiting for write to complete inside
journal_write() routine, an other may need to access read
term value to figure out if promote request is valid to
apply. Due to cooperative multitasking access to the terms
is not consistent so we need to be sure that other fibers
read up to date terms (ie written to the WAL).
For this sake we use a latching mechanism, when one fiber
takes a lock for updating other readers are waiting until
the operation is complete.
For example here is a call graph of two appliers
applier 1
---------
applier_apply_tx
  (promote term = 3
   current max term = 2)
  applier_synchro_filter_tx
  apply_synchro_row
    journal_write
      (sleeping)
at this moment another applier comes in with obsolete
data and term 2
                              applier 2
                              ---------
                              applier_apply_tx
                                (term 2)
                                applier_synchro_filter_tx
                                  txn_limbo_is_replica_outdated -> false
                                journal_write (sleep)
applier 1
---------
journal wakes up
  apply_synchro_row_cb
    set max term to 3
So the applier 2 didn't notice that term 3 is already seen
and wrote obsolete data. With locking the applier 2 will
wait until applier 1 has finished its write.
We introduce the following helpers:
1) txn_limbo_begin: which takes a lock
2) txn_limbo_commit and txn_limbo_rollback which simply release
   the lock but have different names for better semantics
3) txn_limbo_process is a general function which uses x_begin
   and x_commit helper internally
4) txn_limbo_apply to do a real job over processing the
   request, it implies that txn_limbo_begin been called
Testing such in-flight condition won't be easy so we introduce
"box.info.synchro.queue.waiters" field which represent current
number of fibers waiting for limbo to finish request processing.
@TarantoolBot document
Title: synchronous replication changes
`box.info.synchro.queue` gets a new field: `waiters`. It represents
current number of fibers waiting the synchronous transaction processing
to complete.
Part-of #6036
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/applier.cc  | 12 ++++++++---
 src/box/lua/info.c  |  4 +++-
 src/box/txn_limbo.c | 18 ++++++++++++++--
 src/box/txn_limbo.h | 52 ++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 75 insertions(+), 11 deletions(-)
diff --git a/src/box/applier.cc b/src/box/applier.cc
index dd66ef3fa..f09b346a0 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -866,7 +866,7 @@ apply_synchro_row_cb(struct journal_entry *entry)
 		applier_rollback_by_wal_io(entry->res);
 	} else {
 		replica_txn_wal_write_cb(synchro_entry->rcb);
-		txn_limbo_process(&txn_limbo, synchro_entry->req);
+		txn_limbo_apply(&txn_limbo, synchro_entry->req);
 		trigger_run(&replicaset.applier.on_wal_write, NULL);
 	}
 	fiber_wakeup(synchro_entry->owner);
@@ -882,6 +882,8 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 	if (xrow_decode_synchro(row, &req) != 0)
 		goto err;
 
+	txn_limbo_begin(&txn_limbo);
+
 	struct replica_cb_data rcb_data;
 	struct synchro_entry entry;
 	/*
@@ -919,12 +921,16 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
 	 * transactions side, including the async ones.
 	 */
 	if (journal_write(&entry.base) != 0)
-		goto err;
+		goto err_rollback;
 	if (entry.base.res < 0) {
 		diag_set_journal_res(entry.base.res);
-		goto err;
+		goto err_rollback;
 	}
+	txn_limbo_commit(&txn_limbo);
 	return 0;
+
+err_rollback:
+	txn_limbo_rollback(&txn_limbo);
 err:
 	diag_log();
 	return -1;
diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index 8e02f6594..274b0b047 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -637,11 +637,13 @@ lbox_info_synchro(struct lua_State *L)
 
 	/* Queue information. */
 	struct txn_limbo *queue = &txn_limbo;
-	lua_createtable(L, 0, 2);
+	lua_createtable(L, 0, 3);
 	lua_pushnumber(L, queue->len);
 	lua_setfield(L, -2, "len");
 	lua_pushnumber(L, queue->owner_id);
 	lua_setfield(L, -2, "owner");
+	lua_pushnumber(L, queue->promote_latch_cnt);
+	lua_setfield(L, -2, "waiters");
 	lua_setfield(L, -2, "queue");
 
 	return 1;
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 70447caaf..5eb6ee61d 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -47,6 +47,8 @@ txn_limbo_create(struct txn_limbo *limbo)
 	vclock_create(&limbo->vclock);
 	vclock_create(&limbo->promote_term_map);
 	limbo->promote_greatest_term = 0;
+	latch_create(&limbo->promote_latch);
+	limbo->promote_latch_cnt = 0;
 	limbo->confirmed_lsn = 0;
 	limbo->rollback_count = 0;
 	limbo->is_in_rollback = false;
@@ -724,11 +726,14 @@ txn_limbo_wait_empty(struct txn_limbo *limbo, double timeout)
 }
 
 void
-txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
+txn_limbo_apply(struct txn_limbo *limbo,
+		const struct synchro_request *req)
 {
+	assert(latch_is_locked(&limbo->promote_latch));
+
 	uint64_t term = req->term;
 	uint32_t origin = req->origin_id;
-	if (txn_limbo_replica_term(limbo, origin) < term) {
+	if (vclock_get(&limbo->promote_term_map, origin) < (int64_t)term) {
 		vclock_follow(&limbo->promote_term_map, origin, term);
 		if (term > limbo->promote_greatest_term)
 			limbo->promote_greatest_term = term;
@@ -786,6 +791,15 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 	return;
 }
 
+void
+txn_limbo_process(struct txn_limbo *limbo,
+		  const struct synchro_request *req)
+{
+	txn_limbo_begin(limbo);
+	txn_limbo_apply(limbo, req);
+	txn_limbo_commit(limbo);
+}
+
 void
 txn_limbo_on_parameters_change(struct txn_limbo *limbo)
 {
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 53e52f676..42d572595 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -31,6 +31,7 @@
  */
 #include "small/rlist.h"
 #include "vclock/vclock.h"
+#include "latch.h"
 
 #include <stdint.h>
 
@@ -147,6 +148,14 @@ struct txn_limbo {
 	 * limbo and raft are in sync and the terms are the same.
 	 */
 	uint64_t promote_greatest_term;
+	/**
+	 * To order access to the promote data.
+	 */
+	struct latch promote_latch;
+	/**
+	 * Latch owners/waiters stat.
+	 */
+	uint64_t promote_latch_cnt;
 	/**
 	 * Maximal LSN gathered quorum and either already confirmed in WAL, or
 	 * whose confirmation is in progress right now. Any attempt to confirm
@@ -216,7 +225,7 @@ txn_limbo_last_entry(struct txn_limbo *limbo)
  * @a replica_id.
  */
 static inline uint64_t
-txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t replica_id)
+txn_limbo_replica_term(struct txn_limbo *limbo, uint32_t replica_id)
 {
 	return vclock_get(&limbo->promote_term_map, replica_id);
 }
@@ -226,11 +235,14 @@ txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t replica_id)
  * data from it. The check is only valid when elections are enabled.
  */
 static inline bool
-txn_limbo_is_replica_outdated(const struct txn_limbo *limbo,
+txn_limbo_is_replica_outdated(struct txn_limbo *limbo,
 			      uint32_t replica_id)
 {
-	return txn_limbo_replica_term(limbo, replica_id) <
-	       limbo->promote_greatest_term;
+	latch_lock(&limbo->promote_latch);
+	uint64_t v = vclock_get(&limbo->promote_term_map, replica_id);
+	bool res = v < limbo->promote_greatest_term;
+	latch_unlock(&limbo->promote_latch);
+	return res;
 }
 
 /**
@@ -300,7 +312,37 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn);
 int
 txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 
-/** Execute a synchronous replication request. */
+/**
+ * Initiate execution of a synchronous replication request.
+ */
+static inline void
+txn_limbo_begin(struct txn_limbo *limbo)
+{
+	limbo->promote_latch_cnt++;
+	latch_lock(&limbo->promote_latch);
+}
+
+/** Commit a synchronous replication request. */
+static inline void
+txn_limbo_commit(struct txn_limbo *limbo)
+{
+	latch_unlock(&limbo->promote_latch);
+	limbo->promote_latch_cnt--;
+}
+
+/** Rollback a synchronous replication request. */
+static inline void
+txn_limbo_rollback(struct txn_limbo *limbo)
+{
+	latch_unlock(&limbo->promote_latch);
+}
+
+/** Apply a synchronous replication request after processing stage. */
+void
+txn_limbo_apply(struct txn_limbo *limbo,
+		const struct synchro_request *req);
+
+/** Process a synchronous replication request. */
 void
 txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
 
-- 
2.31.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* [Tarantool-patches] [PATCH v27 3/3] test: add gh-6036-qsync-order test
  2021-12-30 20:23 [Tarantool-patches] [PATCH v27 0/3] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
  2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 1/3] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
  2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
@ 2021-12-30 20:23 ` Cyrill Gorcunov via Tarantool-patches
  2022-01-10 14:29   ` Serge Petrenko via Tarantool-patches
  2 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-12-30 20:23 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy
To test that promotion requests are handled only when appropriate
write to WAL completes, because we update memory data before the
write finishes.
Note that without the patch "qsync: order access to the limbo terms"
this test fires the assertion
> tarantool: src/box/txn_limbo.c:481: txn_limbo_read_rollback: Assertion `e->txn->signature >= 0' failed.
Part-of #6036
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 test/replication/gh-6036-qsync-order.result   | 189 ++++++++++++++++++
 test/replication/gh-6036-qsync-order.test.lua |  94 +++++++++
 test/replication/suite.cfg                    |   1 +
 test/replication/suite.ini                    |   2 +-
 4 files changed, 285 insertions(+), 1 deletion(-)
 create mode 100644 test/replication/gh-6036-qsync-order.result
 create mode 100644 test/replication/gh-6036-qsync-order.test.lua
diff --git a/test/replication/gh-6036-qsync-order.result b/test/replication/gh-6036-qsync-order.result
new file mode 100644
index 000000000..f62675b9e
--- /dev/null
+++ b/test/replication/gh-6036-qsync-order.result
@@ -0,0 +1,189 @@
+-- test-run result file version 2
+--
+-- gh-6036: verify that terms are locked when we're inside journal
+-- write routine, because parallel appliers may ignore the fact that
+-- the term is updated already but not yet written leading to data
+-- inconsistency.
+--
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+SERVERS={"election_replica1", "election_replica2", "election_replica3"}
+ | ---
+ | ...
+test_run:create_cluster(SERVERS, "replication", {args='1 nil manual 1'})
+ | ---
+ | ...
+test_run:wait_fullmesh(SERVERS)
+ | ---
+ | ...
+
+--
+-- Create a synchro space on the master node and make
+-- sure the write processed just fine.
+test_run:switch("election_replica1")
+ | ---
+ | - true
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+s = box.schema.create_space('test', {is_sync = true})
+ | ---
+ | ...
+_ = s:create_index('pk')
+ | ---
+ | ...
+s:insert{1}
+ | ---
+ | - [1]
+ | ...
+
+test_run:wait_lsn('election_replica2', 'election_replica1')
+ | ---
+ | ...
+test_run:wait_lsn('election_replica3', 'election_replica1')
+ | ---
+ | ...
+
+--
+-- Drop connection between election_replica1 and election_replica2.
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./election_replica1.sock",   \
+        "unix/:./election_replica3.sock",   \
+    },                                      \
+})
+ | ---
+ | ...
+
+--
+-- Drop connection between election_replica2 and election_replica1.
+test_run:switch("election_replica2")
+ | ---
+ | - true
+ | ...
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./election_replica2.sock",   \
+        "unix/:./election_replica3.sock",   \
+    },                                      \
+})
+ | ---
+ | ...
+
+--
+-- Here we have the following scheme
+--
+--              election_replica3 (will be delayed)
+--              /                \
+--    election_replica1    election_replica2
+
+--
+-- Initiate disk delay in a bit tricky way: the next write will
+-- fall into forever sleep.
+test_run:switch("election_replica3")
+ | ---
+ | - true
+ | ...
+box.error.injection.set("ERRINJ_WAL_DELAY", true)
+ | ---
+ | - ok
+ | ...
+--
+-- Make election_replica2 been a leader and start writting data,
+-- the PROMOTE request get queued on election_replica3 and not
+-- yet processed, same time INSERT won't complete either
+-- waiting for PROMOTE completion first. Note that we
+-- enter election_replica3 as well just to be sure the PROMOTE
+-- reached it.
+test_run:switch("election_replica2")
+ | ---
+ | - true
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+test_run:switch("election_replica3")
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function()                   \
+    return box.info.synchro.queue.waiters > 0   \
+end)
+ | ---
+ | - true
+ | ...
+test_run:switch("election_replica2")
+ | ---
+ | - true
+ | ...
+box.space.test:insert{2}
+ | ---
+ | - [2]
+ | ...
+
+--
+-- The election_replica1 node has no clue that there is a new leader
+-- and continue writing data with obsolete term. Since election_replica3
+-- is delayed now the INSERT won't proceed yet but get queued.
+test_run:switch("election_replica1")
+ | ---
+ | - true
+ | ...
+box.space.test:insert{3}
+ | ---
+ | - [3]
+ | ...
+
+--
+-- Finally enable election_replica3 back. Make sure the data from new
+-- election_replica2 leader get writing while old leader's data ignored.
+test_run:switch("election_replica3")
+ | ---
+ | - true
+ | ...
+box.error.injection.set('ERRINJ_WAL_DELAY', false)
+ | ---
+ | - ok
+ | ...
+test_run:wait_cond(function() return box.space.test:get{2} ~= nil end)
+ | ---
+ | - true
+ | ...
+box.space.test:select{}
+ | ---
+ | - - [1]
+ |   - [2]
+ | ...
+
+test_run:switch("default")
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server election_replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server election_replica2')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server election_replica3')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('delete server election_replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server election_replica2')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server election_replica3')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-6036-qsync-order.test.lua b/test/replication/gh-6036-qsync-order.test.lua
new file mode 100644
index 000000000..826e31301
--- /dev/null
+++ b/test/replication/gh-6036-qsync-order.test.lua
@@ -0,0 +1,94 @@
+--
+-- gh-6036: verify that terms are locked when we're inside journal
+-- write routine, because parallel appliers may ignore the fact that
+-- the term is updated already but not yet written leading to data
+-- inconsistency.
+--
+test_run = require('test_run').new()
+
+SERVERS={"election_replica1", "election_replica2", "election_replica3"}
+test_run:create_cluster(SERVERS, "replication", {args='1 nil manual 1'})
+test_run:wait_fullmesh(SERVERS)
+
+--
+-- Create a synchro space on the master node and make
+-- sure the write processed just fine.
+test_run:switch("election_replica1")
+box.ctl.promote()
+s = box.schema.create_space('test', {is_sync = true})
+_ = s:create_index('pk')
+s:insert{1}
+
+test_run:wait_lsn('election_replica2', 'election_replica1')
+test_run:wait_lsn('election_replica3', 'election_replica1')
+
+--
+-- Drop connection between election_replica1 and election_replica2.
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./election_replica1.sock",   \
+        "unix/:./election_replica3.sock",   \
+    },                                      \
+})
+
+--
+-- Drop connection between election_replica2 and election_replica1.
+test_run:switch("election_replica2")
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./election_replica2.sock",   \
+        "unix/:./election_replica3.sock",   \
+    },                                      \
+})
+
+--
+-- Here we have the following scheme
+--
+--              election_replica3 (will be delayed)
+--              /                \
+--    election_replica1    election_replica2
+
+--
+-- Initiate disk delay in a bit tricky way: the next write will
+-- fall into forever sleep.
+test_run:switch("election_replica3")
+box.error.injection.set("ERRINJ_WAL_DELAY", true)
+--
+-- Make election_replica2 been a leader and start writting data,
+-- the PROMOTE request get queued on election_replica3 and not
+-- yet processed, same time INSERT won't complete either
+-- waiting for PROMOTE completion first. Note that we
+-- enter election_replica3 as well just to be sure the PROMOTE
+-- reached it.
+test_run:switch("election_replica2")
+box.ctl.promote()
+test_run:switch("election_replica3")
+test_run:wait_cond(function()                   \
+    return box.info.synchro.queue.waiters > 0   \
+end)
+test_run:switch("election_replica2")
+box.space.test:insert{2}
+
+--
+-- The election_replica1 node has no clue that there is a new leader
+-- and continue writing data with obsolete term. Since election_replica3
+-- is delayed now the INSERT won't proceed yet but get queued.
+test_run:switch("election_replica1")
+box.space.test:insert{3}
+
+--
+-- Finally enable election_replica3 back. Make sure the data from new
+-- election_replica2 leader get writing while old leader's data ignored.
+test_run:switch("election_replica3")
+box.error.injection.set('ERRINJ_WAL_DELAY', false)
+test_run:wait_cond(function() return box.space.test:get{2} ~= nil end)
+box.space.test:select{}
+
+test_run:switch("default")
+test_run:cmd('stop server election_replica1')
+test_run:cmd('stop server election_replica2')
+test_run:cmd('stop server election_replica3')
+
+test_run:cmd('delete server election_replica1')
+test_run:cmd('delete server election_replica2')
+test_run:cmd('delete server election_replica3')
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 3eee0803c..ed09b2087 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -59,6 +59,7 @@
     "gh-6094-rs-uuid-mismatch.test.lua": {},
     "gh-6127-election-join-new.test.lua": {},
     "gh-6035-applier-filter.test.lua": {},
+    "gh-6036-qsync-order.test.lua": {},
     "election-candidate-promote.test.lua": {},
     "*": {
         "memtx": {"engine": "memtx"},
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index a7b737548..357681919 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -3,7 +3,7 @@ core = tarantool
 script =  master.lua
 description = tarantool/box, replication
 disabled = consistent.test.lua
-release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua  gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua
+release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua  gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua gh-6036-qsync-order.test.lua
 config = suite.cfg
 lua_libs = lua/fast_replica.lua lua/rlimit.lua
 use_unix_sockets = True
-- 
2.31.1
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [Tarantool-patches]  [PATCH v27 2/3] qsync: order access to the limbo terms
  2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
@ 2022-01-10 14:28   ` Serge Petrenko via Tarantool-patches
  2022-01-11 20:39     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2022-01-10 14:28 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vladislav Shpilevoy, tml
[-- Attachment #1: Type: text/plain, Size: 6139 bytes --]
Hi! Thanks for the patch!
 
box_issue_promote() and box_issue_demote() need fine-grained locking anyway.
Otherwise it’s possible that promote() is already issued, but not yet written to WAL, and some
outdated request is applied by applier at that exact moment.
 
You should take the lock before the WAL write, and release it only after txn_limbo_apply.
 
No need to guard every limbo function there is, but we have to guard everything that
writes PROMOTE/DEMOTE.
  
>Четверг, 30 декабря 2021, 23:24 +03:00 от Cyrill Gorcunov < gorcunov@gmail.com >:
> 
>Limbo terms tracking is shared between appliers and when
>one of appliers is waiting for write to complete inside
>journal_write() routine, an other may need to access read
>term value to figure out if promote request is valid to
>apply. Due to cooperative multitasking access to the terms
>is not consistent so we need to be sure that other fibers
>read up to date terms (ie written to the WAL).
>
>For this sake we use a latching mechanism, when one fiber
>takes a lock for updating other readers are waiting until
>the operation is complete.
>
>For example here is a call graph of two appliers
>
>applier 1
>---------
>applier_apply_tx
>  (promote term = 3
>   current max term = 2)
>  applier_synchro_filter_tx
>  apply_synchro_row
>    journal_write
>      (sleeping)
>
>at this moment another applier comes in with obsolete
>data and term 2
>
>                              applier 2
>                              ---------
>                              applier_apply_tx
>                                (term 2)
>                                applier_synchro_filter_tx
>                                  txn_limbo_is_replica_outdated -> false
>                                journal_write (sleep)
>
>applier 1
>---------
>journal wakes up
>  apply_synchro_row_cb
>    set max term to 3
>
>So the applier 2 didn't notice that term 3 is already seen
>and wrote obsolete data. With locking the applier 2 will
>wait until applier 1 has finished its write.
>
>We introduce the following helpers:
>
>1) txn_limbo_begin: which takes a lock
>2) txn_limbo_commit and txn_limbo_rollback which simply release
>   the lock but have different names for better semantics
>3) txn_limbo_process is a general function which uses x_begin
>   and x_commit helper internally
>4) txn_limbo_apply to do a real job over processing the
>   request, it implies that txn_limbo_begin been called
>
>Testing such in-flight condition won't be easy so we introduce
>"box.info.synchro.queue.waiters" field which represent current
>number of fibers waiting for limbo to finish request processing.
>
>@TarantoolBot document
>Title: synchronous replication changes
>
>`box.info.synchro.queue` gets a new field: `waiters`. It represents
>current number of fibers waiting the synchronous transaction processing
>to complete.
>
>Part-of #6036
>
>Signed-off-by: Cyrill Gorcunov < gorcunov@gmail.com >
>---
> src/box/applier.cc | 12 ++++++++---
> src/box/lua/info.c | 4 +++-
> src/box/txn_limbo.c | 18 ++++++++++++++--
> src/box/txn_limbo.h | 52 ++++++++++++++++++++++++++++++++++++++++-----
> 4 files changed, 75 insertions(+), 11 deletions(-)
> 
 
…
 
>diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
>index 53e52f676..42d572595 100644
>--- a/src/box/txn_limbo.h
>+++ b/src/box/txn_limbo.h
 
…
 
>@@ -216,7 +225,7 @@ txn_limbo_last_entry(struct txn_limbo *limbo)
>  * @a replica_id.
>  */
> static inline uint64_t
>-txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t replica_id)
>+txn_limbo_replica_term(struct txn_limbo *limbo, uint32_t replica_id)
> {
 
You’ve forgot to lock the latch here, I guess.
 
>  return vclock_get(&limbo->promote_term_map, replica_id);
> }
>@@ -226,11 +235,14 @@ txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t replica_id)
>  * data from it. The check is only valid when elections are enabled.
>  */
> static inline bool
>-txn_limbo_is_replica_outdated(const struct txn_limbo *limbo,
>+txn_limbo_is_replica_outdated(struct txn_limbo *limbo,
>  uint32_t replica_id)
> {
>- return txn_limbo_replica_term(limbo, replica_id) <
>- limbo->promote_greatest_term;
>+ latch_lock(&limbo->promote_latch);
>+ uint64_t v = vclock_get(&limbo->promote_term_map, replica_id);
>+ bool res = v < limbo->promote_greatest_term;
>+ latch_unlock(&limbo->promote_latch);
>+ return res;
> }
> 
> /**
>@@ -300,7 +312,37 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn);
> int
> txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
> 
>-/** Execute a synchronous replication request. */
>+/**
>+ * Initiate execution of a synchronous replication request.
>+ */
>+static inline void
>+txn_limbo_begin(struct txn_limbo *limbo)
>+{
>+ limbo->promote_latch_cnt++;
>+ latch_lock(&limbo->promote_latch);
 
I suppose you should decrease the latch_cnt right after acquiring the lock.
 
Otherwise you count the sole «limbo user» together with «limbo waiters».
 
>+}
>+
>+/** Commit a synchronous replication request. */
>+static inline void
>+txn_limbo_commit(struct txn_limbo *limbo)
>+{
>+ latch_unlock(&limbo->promote_latch);
>+ limbo->promote_latch_cnt--;
>+}
>+
>+/** Rollback a synchronous replication request. */
>+static inline void
>+txn_limbo_rollback(struct txn_limbo *limbo)
>+{
>+ latch_unlock(&limbo->promote_latch);
 
If you don’t want to decrease the counter right after latch_lock(), you should decrease it
here, as well as in txn_limbo_commit().
 
>+}
>+
>+/** Apply a synchronous replication request after processing stage. */
>+void
>+txn_limbo_apply(struct txn_limbo *limbo,
>+ const struct synchro_request *req);
>+
>+/** Process a synchronous replication request. */
> void
> txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
> 
>--
>2.31.1
 
[-- Attachment #2: Type: text/html, Size: 9475 bytes --]
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [Tarantool-patches]  [PATCH v27 3/3] test: add gh-6036-qsync-order test
  2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 3/3] test: add gh-6036-qsync-order test Cyrill Gorcunov via Tarantool-patches
@ 2022-01-10 14:29   ` Serge Petrenko via Tarantool-patches
  2022-01-11 20:41     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2022-01-10 14:29 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vladislav Shpilevoy, tml
[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]
  
>Четверг, 30 декабря 2021, 23:24 +03:00 от Cyrill Gorcunov < gorcunov@gmail.com >:
> 
>To test that promotion requests are handled only when appropriate
>write to WAL completes, because we update memory data before the
>write finishes.
>
>Note that without the patch "qsync: order access to the limbo terms"
>this test fires the assertion
> 
>> tarantool: src/box/txn_limbo.c:481: txn_limbo_read_rollback: Assertion `e->txn->signature >= 0' failed.
>Part-of #6036
>
>Signed-off-by: Cyrill Gorcunov < gorcunov@gmail.com >
 
The test looks good to me. But it flaky fails on my pc like this:
 
No output during 51 seconds. Will abort after 120 seconds without output. List of workers not reporting the status:
- 001_replication [replication/gh-6036-qsync-order.test.lua, None] at var/001_replication/gh-6036-qsync-order.result:0
[001] replication/gh-6036-qsync-order.test.lua                        [ fail ]
[001]
[001] Test failed! Result content mismatch:
[001] --- replication/gh-6036-qsync-order.result Mon Jan 10 14:10:07 2022
[001] +++ var/rejects/replication/gh-6036-qsync-order.reject Mon Jan 10 17:15:49 2022
[001] @@ -31,13 +31,16 @@
[001]   | ...
[001]  s = box.schema.create_space('test', {is_sync = true})
[001]   | ---
[001] + | - error: Can't modify data on a read-only instance - synchro queue with term 2 belongs
[001] + |     to 1 (1aab3988-56f6-434f-bf38-15174b9664a7)
[001]   | …
 
 
 
Besides, you’ll have to rewrite it in luatest, unfortunately.
 
[-- Attachment #2: Type: text/html, Size: 2442 bytes --]
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms
  2022-01-10 14:28   ` Serge Petrenko via Tarantool-patches
@ 2022-01-11 20:39     ` Cyrill Gorcunov via Tarantool-patches
  2022-01-12 14:01       ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 14+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2022-01-11 20:39 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: Vladislav Shpilevoy, tml
On Mon, Jan 10, 2022 at 05:28:43PM +0300, Serge Petrenko wrote:
>    Hi! Thanks for the patch!
>     
>    box_issue_promote() and box_issue_demote() need fine-grained locking
>    anyway.
>    Otherwise it’s possible that promote() is already issued, but not yet
>    written to WAL, and some
>    outdated request is applied by applier at that exact moment.
True. And in previous series Vlad has asked to not move in code which is
not covered by tests. So I think this is a task for the next part. Currently
we cover only the race between appliers.
>     
>    You should take the lock before the WAL write, and release it only after
>    txn_limbo_apply.
>     
>    No need to guard every limbo function there is, but we have to guard
>    everything that
>    writes PROMOTE/DEMOTE.
...
>      @@ -216,7 +225,7 @@ txn_limbo_last_entry(struct txn_limbo *limbo)
>        * @a replica_id.
>        */
>       static inline uint64_t
>      -txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t
>      replica_id)
>      +txn_limbo_replica_term(struct txn_limbo *limbo, uint32_t replica_id)
>       {
> 
>     
>    You’ve forgot to lock the latch here, I guess.
I did it on a purpose. As you remember we've faced many problems when tried
to implement fine-grained locking inside limbo code. So I dropped this idea
eventually and I think we could start with explicit locks to cover the applier
race and then walk via small steps trying to cover the rest.
>      +/**
>      + * Initiate execution of a synchronous replication request.
>      + */
>      +static inline void
>      +txn_limbo_begin(struct txn_limbo *limbo)
>      +{
>      + limbo->promote_latch_cnt++;
>      + latch_lock(&limbo->promote_latch);
> 
>     
>    I suppose you should decrease the latch_cnt right after acquiring the
>    lock.
>     
>    Otherwise you count the sole «limbo user» together with «limbo waiters».
Yes, this will represent accumulated value. To be honest I never saw such
approach in any other code (ie increment/lock/decrement) but I think this
is fine for fibres, will do.
	Cyrill
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v27 3/3] test: add gh-6036-qsync-order test
  2022-01-10 14:29   ` Serge Petrenko via Tarantool-patches
@ 2022-01-11 20:41     ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2022-01-11 20:41 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: Vladislav Shpilevoy, tml
On Mon, Jan 10, 2022 at 05:29:24PM +0300, Serge Petrenko wrote:
>     
>    The test looks good to me. But it flaky fails on my pc like this:     
> 
>    No output during 51 seconds. Will abort after 120 seconds without output.
>    List of workers not reporting the status:
Hmmm, that's weird. Thanks for info, Serge!
>     
>    Besides, you’ll have to rewrite it in luatest, unfortunately.
OK, will do and we will talk to figure out why it is flaky, ok?
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms
  2022-01-11 20:39     ` Cyrill Gorcunov via Tarantool-patches
@ 2022-01-12 14:01       ` Serge Petrenko via Tarantool-patches
  2022-01-12 21:30         ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2022-01-12 14:01 UTC (permalink / raw)
  To: Cyrill Gorcunov, Vladislav Shpilevoy; +Cc: tml
11.01.2022 23:39, Cyrill Gorcunov пишет:
> On Mon, Jan 10, 2022 at 05:28:43PM +0300, Serge Petrenko wrote:
>>     Hi! Thanks for the patch!
>>      
>>     box_issue_promote() and box_issue_demote() need fine-grained locking
>>     anyway.
>>     Otherwise it’s possible that promote() is already issued, but not yet
>>     written to WAL, and some
>>     outdated request is applied by applier at that exact moment.
> True. And in previous series Vlad has asked to not move in code which is
> not covered by tests. So I think this is a task for the next part. Currently
> we cover only the race between appliers.
Let's ask Vlad, then.
I feel like we should fix this now, not waiting for a full fine-grained 
locking
patch.
First of all, this is a known bug (and fine-grained locking was meant to
cover everything we don't know of, just in case).
Besides, simply locking issue_promote/issue_demote should be
much easier than implementing the fine-grained locking patch.
>
>>      
>>     You should take the lock before the WAL write, and release it only after
>>     txn_limbo_apply.
>>      
>>     No need to guard every limbo function there is, but we have to guard
>>     everything that
>>     writes PROMOTE/DEMOTE.
> ...
>>       @@ -216,7 +225,7 @@ txn_limbo_last_entry(struct txn_limbo *limbo)
>>         * @a replica_id.
>>         */
>>        static inline uint64_t
>>       -txn_limbo_replica_term(const struct txn_limbo *limbo, uint32_t
>>       replica_id)
>>       +txn_limbo_replica_term(struct txn_limbo *limbo, uint32_t replica_id)
>>        {
>>
>>      
>>     You’ve forgot to lock the latch here, I guess.
> I did it on a purpose. As you remember we've faced many problems when tried
> to implement fine-grained locking inside limbo code. So I dropped this idea
> eventually and I think we could start with explicit locks to cover the applier
> race and then walk via small steps trying to cover the rest.
Ok, then return `const ` to the function declaration, please.
>
>>       +/**
>>       + * Initiate execution of a synchronous replication request.
>>       + */
>>       +static inline void
>>       +txn_limbo_begin(struct txn_limbo *limbo)
>>       +{
>>       + limbo->promote_latch_cnt++;
>>       + latch_lock(&limbo->promote_latch);
>>
>>      
>>     I suppose you should decrease the latch_cnt right after acquiring the
>>     lock.
>>      
>>     Otherwise you count the sole «limbo user» together with «limbo waiters».
> Yes, this will represent accumulated value. To be honest I never saw such
> approach in any other code (ie increment/lock/decrement) but I think this
> is fine for fibres, will do.
It just looks strange to me that `synchro.queue.waiters` will be 
non-zero when
someone simply uses the limbo.
They are `waiters`, not `users` or something else.
>
> 	Cyrill
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms
  2022-01-12 14:01       ` Serge Petrenko via Tarantool-patches
@ 2022-01-12 21:30         ` Vladislav Shpilevoy via Tarantool-patches
  2022-01-13 10:13           ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2022-01-12 21:30 UTC (permalink / raw)
  To: Serge Petrenko, Cyrill Gorcunov; +Cc: tml
Hi!
On 12.01.2022 15:01, Serge Petrenko wrote:
> 
> 
> 11.01.2022 23:39, Cyrill Gorcunov пишет:
>> On Mon, Jan 10, 2022 at 05:28:43PM +0300, Serge Petrenko wrote:
>>>     Hi! Thanks for the patch!
>>>          box_issue_promote() and box_issue_demote() need fine-grained locking
>>>     anyway.
>>>     Otherwise it’s possible that promote() is already issued, but not yet
>>>     written to WAL, and some
>>>     outdated request is applied by applier at that exact moment.
>> True. And in previous series Vlad has asked to not move in code which is
>> not covered by tests. So I think this is a task for the next part. Currently
>> we cover only the race between appliers.
> 
> Let's ask Vlad, then.
> 
> I feel like we should fix this now, not waiting for a full fine-grained locking
> patch.
> 
> First of all, this is a known bug (and fine-grained locking was meant to
> cover everything we don't know of, just in case).
I am not sure I understand what you both are talking about here. Sergey, do
you mean 'fine-grained locking' as big critical sections covering a lot of
code at once or as many small critical sections?
I am confused because of this sentence. "Cover everything we don't know" is
rather opposite to fine-grained locking. I voted for big locks because
apparently it was too hard to implement smaller more precise locks.
> Besides, simply locking issue_promote/issue_demote should be
> much easier than implementing the fine-grained locking patch.
Yes. I remember the proposal was to lock entire promote/demote and other
qsync/raft functions from beginning to end. Because it should be relatively
easy. I didn't look at the code in this patch though, can't comment it.
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms
  2022-01-12 21:30         ` Vladislav Shpilevoy via Tarantool-patches
@ 2022-01-13 10:13           ` Serge Petrenko via Tarantool-patches
  2022-01-13 23:32             ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2022-01-13 10:13 UTC (permalink / raw)
  To: Vladislav Shpilevoy, Cyrill Gorcunov; +Cc: tml
13.01.2022 00:30, Vladislav Shpilevoy пишет:
> Hi!
>
> On 12.01.2022 15:01, Serge Petrenko wrote:
>>
>> 11.01.2022 23:39, Cyrill Gorcunov пишет:
>>> On Mon, Jan 10, 2022 at 05:28:43PM +0300, Serge Petrenko wrote:
>>>>      Hi! Thanks for the patch!
>>>>           box_issue_promote() and box_issue_demote() need fine-grained locking
>>>>      anyway.
>>>>      Otherwise it’s possible that promote() is already issued, but not yet
>>>>      written to WAL, and some
>>>>      outdated request is applied by applier at that exact moment.
>>> True. And in previous series Vlad has asked to not move in code which is
>>> not covered by tests. So I think this is a task for the next part. Currently
>>> we cover only the race between appliers.
>> Let's ask Vlad, then.
>>
>> I feel like we should fix this now, not waiting for a full fine-grained locking
>> patch.
>>
>> First of all, this is a known bug (and fine-grained locking was meant to
>> cover everything we don't know of, just in case).
> I am not sure I understand what you both are talking about here. Sergey, do
> you mean 'fine-grained locking' as big critical sections covering a lot of
> code at once or as many small critical sections?
I mean "locking every limbo function", like Cyrill tried to do that in the
previous patch version.
>
> I am confused because of this sentence. "Cover everything we don't know" is
> rather opposite to fine-grained locking. I voted for big locks because
> apparently it was too hard to implement smaller more precise locks.
>
>> Besides, simply locking issue_promote/issue_demote should be
>> much easier than implementing the fine-grained locking patch.
> Yes. I remember the proposal was to lock entire promote/demote and other
> qsync/raft functions from beginning to end. Because it should be relatively
> easy. I didn't look at the code in this patch though, can't comment it.
This particular patch only locks applier_apply_synchro_request(), 
txn_limbo_process()
and txn_limbo_is_replica_outdated(), so that applier cannot apply a 
request from an
already stale term.
My proposal is to lock box_issue_promote() and box_issue_demote()
(not whole promote/demote) to get rid of another race: when promote is 
written
to WAL, but not yed processed.
What you're talking about is what I call "fine grained locking", and it 
turned
out rather hard to implement, so Cyrill abandoned this idea for now.
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms
  2022-01-13 10:13           ` Serge Petrenko via Tarantool-patches
@ 2022-01-13 23:32             ` Vladislav Shpilevoy via Tarantool-patches
  2022-01-14 10:20               ` Serge Petrenko via Tarantool-patches
  0 siblings, 1 reply; 14+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2022-01-13 23:32 UTC (permalink / raw)
  To: Serge Petrenko, Cyrill Gorcunov; +Cc: tml
On 13.01.2022 11:13, Serge Petrenko wrote:
> 
> 
> 13.01.2022 00:30, Vladislav Shpilevoy пишет:
>> Hi!
>>
>> On 12.01.2022 15:01, Serge Petrenko wrote:
>>>
>>> 11.01.2022 23:39, Cyrill Gorcunov пишет:
>>>> On Mon, Jan 10, 2022 at 05:28:43PM +0300, Serge Petrenko wrote:
>>>>>      Hi! Thanks for the patch!
>>>>>           box_issue_promote() and box_issue_demote() need fine-grained locking
>>>>>      anyway.
>>>>>      Otherwise it’s possible that promote() is already issued, but not yet
>>>>>      written to WAL, and some
>>>>>      outdated request is applied by applier at that exact moment.
>>>> True. And in previous series Vlad has asked to not move in code which is
>>>> not covered by tests. So I think this is a task for the next part. Currently
>>>> we cover only the race between appliers.
>>> Let's ask Vlad, then.
>>>
>>> I feel like we should fix this now, not waiting for a full fine-grained locking
>>> patch.
>>>
>>> First of all, this is a known bug (and fine-grained locking was meant to
>>> cover everything we don't know of, just in case).
>> I am not sure I understand what you both are talking about here. Sergey, do
>> you mean 'fine-grained locking' as big critical sections covering a lot of
>> code at once or as many small critical sections?
> 
> I mean "locking every limbo function", like Cyrill tried to do that in the
> previous patch version.
Давай по-русски, тут какое-то недопонимание.
В старых версиях Кирилл пытался лочить слишком мелко. Протестировать такое было
тяжеловато. Потому та версия не зашла - тестов было 0.
>> I am confused because of this sentence. "Cover everything we don't know" is
>> rather opposite to fine-grained locking. I voted for big locks because
>> apparently it was too hard to implement smaller more precise locks.
>>
>>> Besides, simply locking issue_promote/issue_demote should be
>>> much easier than implementing the fine-grained locking patch.
>> Yes. I remember the proposal was to lock entire promote/demote and other
>> qsync/raft functions from beginning to end. Because it should be relatively
>> easy. I didn't look at the code in this patch though, can't comment it.
> 
> This particular patch only locks applier_apply_synchro_request(), txn_limbo_process()
> and txn_limbo_is_replica_outdated(), so that applier cannot apply a request from an
> already stale term.
> 
> My proposal is to lock box_issue_promote() and box_issue_demote()
> (not whole promote/demote) to get rid of another race: when promote is written
> to WAL, but not yed processed.
Почему не лочить целиком promote/demote? Может если локи были бы шире, то не
нужно было бы и на триггеры портировать все как в новом тикете?
> What you're talking about is what I call "fine grained locking", and it turned
> out rather hard to implement, so Cyrill abandoned this idea for now.
fine grained значит "мелко-зернистый". То есть локи были бы на мелкие куски кода,
как сначала Кирилл пытался сделать. Я как раз за наоборот топлю - блокировать
сразу большие куски, а не "мелко".
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms
  2022-01-13 23:32             ` Vladislav Shpilevoy via Tarantool-patches
@ 2022-01-14 10:20               ` Serge Petrenko via Tarantool-patches
  2022-01-14 10:33                 ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko via Tarantool-patches @ 2022-01-14 10:20 UTC (permalink / raw)
  To: Vladislav Shpilevoy, Cyrill Gorcunov; +Cc: tml
14.01.2022 02:32, Vladislav Shpilevoy пишет:
>
> On 13.01.2022 11:13, Serge Petrenko wrote:
>>
>> 13.01.2022 00:30, Vladislav Shpilevoy пишет:
>>> Hi!
>>>
>>> On 12.01.2022 15:01, Serge Petrenko wrote:
>>>> 11.01.2022 23:39, Cyrill Gorcunov пишет:
>>>>> On Mon, Jan 10, 2022 at 05:28:43PM +0300, Serge Petrenko wrote:
>>>>>>       Hi! Thanks for the patch!
>>>>>>            box_issue_promote() and box_issue_demote() need fine-grained locking
>>>>>>       anyway.
>>>>>>       Otherwise it’s possible that promote() is already issued, but not yet
>>>>>>       written to WAL, and some
>>>>>>       outdated request is applied by applier at that exact moment.
>>>>> True. And in previous series Vlad has asked to not move in code which is
>>>>> not covered by tests. So I think this is a task for the next part. Currently
>>>>> we cover only the race between appliers.
>>>> Let's ask Vlad, then.
>>>>
>>>> I feel like we should fix this now, not waiting for a full fine-grained locking
>>>> patch.
>>>>
>>>> First of all, this is a known bug (and fine-grained locking was meant to
>>>> cover everything we don't know of, just in case).
>>> I am not sure I understand what you both are talking about here. Sergey, do
>>> you mean 'fine-grained locking' as big critical sections covering a lot of
>>> code at once or as many small critical sections?
>> I mean "locking every limbo function", like Cyrill tried to do that in the
>> previous patch version.
> Давай по-русски, тут какое-то недопонимание.
>
> В старых версиях Кирилл пытался лочить слишком мелко. Протестировать такое было
> тяжеловато. Потому та версия не зашла - тестов было 0.
В общем после обсуждения в чате я понял твою идею.
В предпоследней версии (26) уже не было таких мелких локов.
Там лочились функции лимба целиком.
>
>>> I am confused because of this sentence. "Cover everything we don't know" is
>>> rather opposite to fine-grained locking. I voted for big locks because
>>> apparently it was too hard to implement smaller more precise locks.
>>>
>>>> Besides, simply locking issue_promote/issue_demote should be
>>>> much easier than implementing the fine-grained locking patch.
>>> Yes. I remember the proposal was to lock entire promote/demote and other
>>> qsync/raft functions from beginning to end. Because it should be relatively
>>> easy. I didn't look at the code in this patch though, can't comment it.
>> This particular patch only locks applier_apply_synchro_request(), txn_limbo_process()
>> and txn_limbo_is_replica_outdated(), so that applier cannot apply a request from an
>> already stale term.
>>
>> My proposal is to lock box_issue_promote() and box_issue_demote()
>> (not whole promote/demote) to get rid of another race: when promote is written
>> to WAL, but not yed processed.
> Почему не лочить целиком promote/demote? Может если локи были бы шире, то не
> нужно было бы и на триггеры портировать все как в новом тикете?
Как ты и написал в чате, залочить их прямо целиком не получится. Там 
есть место,
где мы ждём применения CONFIRM/ROLLBACK/etc.
Лочить какой-то большой кусок промоута (но не весь промоут) я пока 
смысла не вижу.
Может получиться что там опять какой-то дедлок будет или ещё что-то, и 
мы будем
кучу времени с этим разбираться (непонятно ради чего).
В общем, Кирилл, я предлагаю в этом патче добавить лок на весь
box_issue_promote()/demote(). Выглядит так, что это не добавит проблем,
зато это точно закроет багос с тем, что после отправленного на запись 
PROMOTE апплаер
может применить реквест из уже устаревшего терма.
Нужно будет и тест на это сделать, но он выглядит довольно просто:
сделать на ноде 1 промоут,
на ноде 2 сделать
error.injection.set(ERRINJ_WAL_WRITE_COUNTDOWN, 2)
(или как там надо чтобы терм записался, а промот завис)
box.ctl.promote()
на ноде 1 что-нибудь вставить в синхронный спейс
на ноде 2 убрать ерриндж и проверить что вставка в спейс не прошла.
С тестом на демоут чуть сложнее будет, потому что он только на мастере
может работать.
>
>> What you're talking about is what I call "fine grained locking", and it turned
>> out rather hard to implement, so Cyrill abandoned this idea for now.
> fine grained значит "мелко-зернистый". То есть локи были бы на мелкие куски кода,
> как сначала Кирилл пытался сделать. Я как раз за наоборот топлю - блокировать
> сразу большие куски, а не "мелко".
Ага, теперь понятно.
-- 
Serge Petrenko
^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms
  2022-01-14 10:20               ` Serge Petrenko via Tarantool-patches
@ 2022-01-14 10:33                 ` Cyrill Gorcunov via Tarantool-patches
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2022-01-14 10:33 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: Vladislav Shpilevoy, tml
On Fri, Jan 14, 2022 at 01:20:27PM +0300, Serge Petrenko wrote:
...
> > Давай по-русски, тут какое-то недопонимание.
> > 
> > В старых версиях Кирилл пытался лочить слишком мелко. Протестировать такое было
> > тяжеловато. Потому та версия не зашла - тестов было 0.
> 
> В общем после обсуждения в чате я понял твою идею.
> В предпоследней версии (26) уже не было таких мелких локов.
> Там лочились функции лимба целиком.
Да, и там вылезало, что некоторые функции реентарные, некоторые переменные
надо было переводить в режим read-safe (например, тестирование того, что
очередь пуста, хотя мы в этот момент могли лок забрать). там куча проблем
повылезала связанная с общей архитектурой. поэтому идею с локами лимба в
той форме, в которой мне хотелось ее увидеть, пришлось отложить.
> > > > I am confused because of this sentence. "Cover everything we don't know" is
> > > > rather opposite to fine-grained locking. I voted for big locks because
> > > > apparently it was too hard to implement smaller more precise locks.
> > > > 
> > > > > Besides, simply locking issue_promote/issue_demote should be
> > > > > much easier than implementing the fine-grained locking patch.
> > > > Yes. I remember the proposal was to lock entire promote/demote and other
> > > > qsync/raft functions from beginning to end. Because it should be relatively
> > > > easy. I didn't look at the code in this patch though, can't comment it.
> > > This particular patch only locks applier_apply_synchro_request(), txn_limbo_process()
> > > and txn_limbo_is_replica_outdated(), so that applier cannot apply a request from an
> > > already stale term.
> > > 
> > > My proposal is to lock box_issue_promote() and box_issue_demote()
> > > (not whole promote/demote) to get rid of another race: when promote is written
> > > to WAL, but not yed processed.
> > Почему не лочить целиком promote/demote? Может если локи были бы шире, то не
> > нужно было бы и на триггеры портировать все как в новом тикете?
> 
> Как ты и написал в чате, залочить их прямо целиком не получится. Там есть
> место, где мы ждём применения CONFIRM/ROLLBACK/etc.
> 
> Лочить какой-то большой кусок промоута (но не весь промоут) я пока смысла не
> вижу. Может получиться что там опять какой-то дедлок будет или ещё что-то, и мы
> будем кучу времени с этим разбираться (непонятно ради чего).
> 
> В общем, Кирилл, я предлагаю в этом патче добавить лок на весь
> box_issue_promote()/demote(). Выглядит так, что это не добавит проблем,
> зато это точно закроет багос с тем, что после отправленного на запись
> PROMOTE апплаер может применить реквест из уже устаревшего терма.
Добро!
> Нужно будет и тест на это сделать, но он выглядит довольно просто:
> сделать на ноде 1 промоут, на ноде 2 сделать
> error.injection.set(ERRINJ_WAL_WRITE_COUNTDOWN, 2)
> (или как там надо чтобы терм записался, а промот завис) box.ctl.promote()
> на ноде 1 что-нибудь вставить в синхронный спейс
> на ноде 2 убрать ерриндж и проверить что вставка в спейс не прошла.
> 
> С тестом на демоут чуть сложнее будет, потому что он только на мастере
> может работать.
Хорошо, попробую. Я пока начал переводить текущйи тест в формат luatest,
ну когда время позволяет.
> > > What you're talking about is what I call "fine grained locking", and it turned
> > > out rather hard to implement, so Cyrill abandoned this idea for now.
> > fine grained значит "мелко-зернистый". То есть локи были бы на мелкие куски кода,
> > как сначала Кирилл пытался сделать. Я как раз за наоборот топлю - блокировать
> > сразу большие куски, а не "мелко".
> 
> Ага, теперь понятно.
^ permalink raw reply	[flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-01-14 10:33 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-30 20:23 [Tarantool-patches] [PATCH v27 0/3] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 1/3] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 2/3] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
2022-01-10 14:28   ` Serge Petrenko via Tarantool-patches
2022-01-11 20:39     ` Cyrill Gorcunov via Tarantool-patches
2022-01-12 14:01       ` Serge Petrenko via Tarantool-patches
2022-01-12 21:30         ` Vladislav Shpilevoy via Tarantool-patches
2022-01-13 10:13           ` Serge Petrenko via Tarantool-patches
2022-01-13 23:32             ` Vladislav Shpilevoy via Tarantool-patches
2022-01-14 10:20               ` Serge Petrenko via Tarantool-patches
2022-01-14 10:33                 ` Cyrill Gorcunov via Tarantool-patches
2021-12-30 20:23 ` [Tarantool-patches] [PATCH v27 3/3] test: add gh-6036-qsync-order test Cyrill Gorcunov via Tarantool-patches
2022-01-10 14:29   ` Serge Petrenko via Tarantool-patches
2022-01-11 20:41     ` Cyrill Gorcunov via Tarantool-patches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox