Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v2 00/19] Sync replication
@ 2020-06-29 23:15 ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 01/19] replication: introduce space.is_sync option Vladislav Shpilevoy
                     ` (27 more replies)
  0 siblings, 28 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Synchronous replication draft patchset. From the previous version
changed almost everything, not much sense to describe it all here.

Branch: http://github.com/tarantool/tarantool/tree/gh-4842-sync-replication
Issue: https://github.com/tarantool/tarantool/issues/4842

Leonid Vasiliev (1):
  replication: add support of qsync to the snapshot machinery

Serge Petrenko (11):
  xrow: introduce CONFIRM and ROLLBACK entries
  txn: introduce various reasons for txn rollback
  replication: write and read CONFIRM entries
  txn_limbo: add timeout when waiting for acks.
  txn_limbo: add ROLLBACK processing
  box: rework local_recovery to use async txn_commit
  replication: support ROLLBACK and CONFIRM during recovery
  replication: add test for synchro CONFIRM/ROLLBACK
  txn_limbo: add diag_set in txn_limbo_wait_confirm
  replication: delay initial join until confirmation
  replication: only send confirmed data during final join

Vladislav Shpilevoy (7):
  replication: introduce space.is_sync option
  replication: introduce replication_synchro_* cfg options
  txn: add TXN_WAIT_ACK flag
  replication: make sync transactions wait quorum
  applier: remove writer_cond
  applier: send heartbeat not only on commit, but on any write
  replication: block async transactions when not empty limbo

 src/box/CMakeLists.txt                        |   1 +
 src/box/alter.cc                              |   5 +
 src/box/applier.cc                            | 125 ++++-
 src/box/applier.h                             |   2 -
 src/box/box.cc                                | 148 +++++-
 src/box/box.h                                 |   2 +
 src/box/errcode.h                             |   4 +
 src/box/gc.c                                  |  12 +
 src/box/iproto_constants.h                    |  12 +
 src/box/lua/cfg.cc                            |  18 +
 src/box/lua/load_cfg.lua                      |  10 +
 src/box/lua/net_box.lua                       |   2 +
 src/box/lua/schema.lua                        |   3 +
 src/box/lua/space.cc                          |   5 +
 src/box/relay.cc                              |  22 +-
 src/box/replication.cc                        |   2 +
 src/box/replication.h                         |  12 +
 src/box/space_def.c                           |   2 +
 src/box/space_def.h                           |   6 +
 src/box/txn.c                                 | 155 +++++-
 src/box/txn.h                                 |  50 ++
 src/box/txn_limbo.c                           | 458 ++++++++++++++++++
 src/box/txn_limbo.h                           | 248 ++++++++++
 src/box/xrow.c                                | 106 ++++
 src/box/xrow.h                                |  46 ++
 test/app-tap/init_script.result               |   2 +
 test/box/admin.result                         |   4 +
 test/box/cfg.result                           |   8 +
 test/box/error.result                         |   4 +
 .../sync_replication_sanity.result            | 331 +++++++++++++
 .../sync_replication_sanity.test.lua          | 133 +++++
 test/unit/CMakeLists.txt                      |   3 +
 test/unit/snap_quorum_delay.cc                | 254 ++++++++++
 test/unit/snap_quorum_delay.result            |  12 +
 34 files changed, 2178 insertions(+), 29 deletions(-)
 create mode 100644 src/box/txn_limbo.c
 create mode 100644 src/box/txn_limbo.h
 create mode 100644 test/replication/sync_replication_sanity.result
 create mode 100644 test/replication/sync_replication_sanity.test.lua
 create mode 100644 test/unit/snap_quorum_delay.cc
 create mode 100644 test/unit/snap_quorum_delay.result

-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 01/19] replication: introduce space.is_sync option
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-30 23:00     ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 10/19] txn_limbo: add ROLLBACK processing Vladislav Shpilevoy
                     ` (26 subsequent siblings)
  27 siblings, 1 reply; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Synchronous space makes every transaction, affecting its data,
wait until it is replicated on a quorum of replicas before it is
committed.

Part of #4844
Part of #5073
---
 src/box/alter.cc                              |  5 ++
 src/box/lua/net_box.lua                       |  2 +
 src/box/lua/schema.lua                        |  3 +
 src/box/lua/space.cc                          |  5 ++
 src/box/space_def.c                           |  2 +
 src/box/space_def.h                           |  6 ++
 .../sync_replication_sanity.result            | 71 +++++++++++++++++++
 .../sync_replication_sanity.test.lua          | 29 ++++++++
 8 files changed, 123 insertions(+)
 create mode 100644 test/replication/sync_replication_sanity.result
 create mode 100644 test/replication/sync_replication_sanity.test.lua

diff --git a/src/box/alter.cc b/src/box/alter.cc
index bb4254878..249cee2da 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -663,6 +663,11 @@ space_def_new_from_tuple(struct tuple *tuple, uint32_t errcode,
 		diag_set(ClientError, ER_VIEW_MISSING_SQL);
 		return NULL;
 	}
+	if (opts.is_sync && opts.group_id == GROUP_LOCAL) {
+		diag_set(ClientError, errcode, tt_cstr(name, name_len),
+			 "local space can't be synchronous");
+		return NULL;
+	}
 	struct space_def *def =
 		space_def_new(id, uid, exact_field_count, name, name_len,
 			      engine_name, engine_name_len, &opts, fields,
diff --git a/src/box/lua/net_box.lua b/src/box/lua/net_box.lua
index 9560bfdd4..34fb25e4a 100644
--- a/src/box/lua/net_box.lua
+++ b/src/box/lua/net_box.lua
@@ -1344,6 +1344,7 @@ function remote_methods:_install_schema(schema_version, spaces, indices,
         s.enabled = true
         s.index = {}
         s.temporary = false
+        s.is_sync = false
         s._format = format
         s._format_cdata = box.internal.new_tuple_format(format)
         s.connection = self
@@ -1352,6 +1353,7 @@ function remote_methods:_install_schema(schema_version, spaces, indices,
             if type(opts) == 'table' then
                 -- Tarantool >= 1.7.0
                 s.temporary = not not opts.temporary
+                s.is_sync = not not opts.is_sync
             elseif type(opts) == 'string' then
                 -- Tarantool < 1.7.0
                 s.temporary = string.match(opts, 'temporary') ~= nil
diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
index e6844b45f..2edf25fae 100644
--- a/src/box/lua/schema.lua
+++ b/src/box/lua/schema.lua
@@ -413,6 +413,7 @@ box.schema.space.create = function(name, options)
         format = 'table',
         is_local = 'boolean',
         temporary = 'boolean',
+        is_sync = 'boolean',
     }
     local options_defaults = {
         engine = 'memtx',
@@ -457,6 +458,7 @@ box.schema.space.create = function(name, options)
     local space_options = setmap({
         group_id = options.is_local and 1 or nil,
         temporary = options.temporary and true or nil,
+        is_sync = options.is_sync
     })
     _space:insert{id, uid, name, options.engine, options.field_count,
         space_options, format}
@@ -2704,6 +2706,7 @@ local function box_space_mt(tab)
                 engine = v.engine,
                 is_local = v.is_local,
                 temporary = v.temporary,
+                is_sync = v.is_sync,
             }
         end
     end
diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index d0e44dd41..177c58830 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -253,6 +253,11 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
 	lua_pushstring(L, space->def->engine_name);
 	lua_settable(L, i);
 
+	/* space.is_sync */
+	lua_pushstring(L, "is_sync");
+	lua_pushboolean(L, space->def->opts.is_sync);
+	lua_settable(L, i);
+
 	lua_pushstring(L, "enabled");
 	lua_pushboolean(L, space_index(space, 0) != 0);
 	lua_settable(L, i);
diff --git a/src/box/space_def.c b/src/box/space_def.c
index efb1c8ee9..83566bf02 100644
--- a/src/box/space_def.c
+++ b/src/box/space_def.c
@@ -41,6 +41,7 @@ const struct space_opts space_opts_default = {
 	/* .is_temporary = */ false,
 	/* .is_ephemeral = */ false,
 	/* .view = */ false,
+	/* .is_sync = */ false,
 	/* .sql        = */ NULL,
 };
 
@@ -48,6 +49,7 @@ const struct opt_def space_opts_reg[] = {
 	OPT_DEF("group_id", OPT_UINT32, struct space_opts, group_id),
 	OPT_DEF("temporary", OPT_BOOL, struct space_opts, is_temporary),
 	OPT_DEF("view", OPT_BOOL, struct space_opts, is_view),
+	OPT_DEF("is_sync", OPT_BOOL, struct space_opts, is_sync),
 	OPT_DEF("sql", OPT_STRPTR, struct space_opts, sql),
 	OPT_DEF_LEGACY("checks"),
 	OPT_END,
diff --git a/src/box/space_def.h b/src/box/space_def.h
index 788b601e6..198242d02 100644
--- a/src/box/space_def.h
+++ b/src/box/space_def.h
@@ -67,6 +67,12 @@ struct space_opts {
 	 * this flag can't be changed after space creation.
 	 */
 	bool is_view;
+	/**
+	 * Synchronous space makes all transactions, affecting its
+	 * data, synchronous. That means they are not applied
+	 * until replicated to a quorum of replicas.
+	 */
+	bool is_sync;
 	/** SQL statement that produced this space. */
 	char *sql;
 };
diff --git a/test/replication/sync_replication_sanity.result b/test/replication/sync_replication_sanity.result
new file mode 100644
index 000000000..551df7daf
--- /dev/null
+++ b/test/replication/sync_replication_sanity.result
@@ -0,0 +1,71 @@
+-- test-run result file version 2
+--
+-- gh-4282: synchronous replication. It allows to make certain
+-- spaces commit only when their changes are replicated to a
+-- quorum of replicas.
+--
+s1 = box.schema.create_space('test1', {is_sync = true})
+ | ---
+ | ...
+s1.is_sync
+ | ---
+ | - true
+ | ...
+pk = s1:create_index('pk')
+ | ---
+ | ...
+box.begin() s1:insert({1}) s1:insert({2}) box.commit()
+ | ---
+ | ...
+s1:select{}
+ | ---
+ | - - [1]
+ |   - [2]
+ | ...
+
+-- Default is async.
+s2 = box.schema.create_space('test2')
+ | ---
+ | ...
+s2.is_sync
+ | ---
+ | - false
+ | ...
+
+-- Net.box takes sync into account.
+box.schema.user.grant('guest', 'super')
+ | ---
+ | ...
+netbox = require('net.box')
+ | ---
+ | ...
+c = netbox.connect(box.cfg.listen)
+ | ---
+ | ...
+c.space.test1.is_sync
+ | ---
+ | - true
+ | ...
+c.space.test2.is_sync
+ | ---
+ | - false
+ | ...
+c:close()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'super')
+ | ---
+ | ...
+
+s1:drop()
+ | ---
+ | ...
+s2:drop()
+ | ---
+ | ...
+
+-- Local space can't be synchronous.
+box.schema.create_space('test', {is_sync = true, is_local = true})
+ | ---
+ | - error: 'Failed to create space ''test'': local space can''t be synchronous'
+ | ...
diff --git a/test/replication/sync_replication_sanity.test.lua b/test/replication/sync_replication_sanity.test.lua
new file mode 100644
index 000000000..fd7ed537e
--- /dev/null
+++ b/test/replication/sync_replication_sanity.test.lua
@@ -0,0 +1,29 @@
+--
+-- gh-4282: synchronous replication. It allows to make certain
+-- spaces commit only when their changes are replicated to a
+-- quorum of replicas.
+--
+s1 = box.schema.create_space('test1', {is_sync = true})
+s1.is_sync
+pk = s1:create_index('pk')
+box.begin() s1:insert({1}) s1:insert({2}) box.commit()
+s1:select{}
+
+-- Default is async.
+s2 = box.schema.create_space('test2')
+s2.is_sync
+
+-- Net.box takes sync into account.
+box.schema.user.grant('guest', 'super')
+netbox = require('net.box')
+c = netbox.connect(box.cfg.listen)
+c.space.test1.is_sync
+c.space.test2.is_sync
+c:close()
+box.schema.user.revoke('guest', 'super')
+
+s1:drop()
+s2:drop()
+
+-- Local space can't be synchronous.
+box.schema.create_space('test', {is_sync = true, is_local = true})
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 10/19] txn_limbo: add ROLLBACK processing
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 01/19] replication: introduce space.is_sync option Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-07-05 15:29     ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 11/19] box: rework local_recovery to use async txn_commit Vladislav Shpilevoy
                     ` (25 subsequent siblings)
  27 siblings, 1 reply; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Serge Petrenko <sergepetrenko@tarantool.org>

Now txn_limbo writes a ROLLBACK entry to WAL when one of the limbo
entries fails to gather quorum during a txn_limbo_confirm_timeout.
All the limbo entries, starting with the failed one, are rolled back in
reverse order.

Closes #4848
---
 src/box/applier.cc    |  38 +++++++++----
 src/box/box.cc        |   2 +-
 src/box/errcode.h     |   2 +
 src/box/relay.cc      |   2 +-
 src/box/txn.c         |  19 +++++--
 src/box/txn_limbo.c   | 124 ++++++++++++++++++++++++++++++++++++++----
 src/box/txn_limbo.h   |  12 +++-
 test/box/error.result |   2 +
 8 files changed, 173 insertions(+), 28 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 1b9ea2f71..fbb452dc0 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -256,19 +256,25 @@ process_nop(struct request *request)
 }
 
 /*
- * CONFIRM rows aren't dml requests  and require special
+ * CONFIRM/ROLLBACK rows aren't dml requests  and require special
  * handling: instead of performing some operations on spaces,
- * processing these requests required txn_limbo to confirm some
- * of its entries.
+ * processing these requests requires txn_limbo to either confirm
+ * or rollback some of its entries.
  */
 static int
-process_confirm(struct request *request)
+process_confirm_rollback(struct request *request, bool is_confirm)
 {
-	assert(request->header->type == IPROTO_CONFIRM);
+	assert(iproto_type_is_synchro_request(request->header->type));
 	uint32_t replica_id;
 	struct txn *txn = in_txn();
 	int64_t lsn = 0;
-	if (xrow_decode_confirm(request->header, &replica_id, &lsn) != 0)
+
+	int res = 0;
+	if (is_confirm)
+		res = xrow_decode_confirm(request->header, &replica_id, &lsn);
+	else
+		res = xrow_decode_rollback(request->header, &replica_id, &lsn);
+	if (res == -1)
 		return -1;
 
 	if (replica_id != txn_limbo.instance_id) {
@@ -281,7 +287,10 @@ process_confirm(struct request *request)
 		return -1;
 
 	if (txn_commit_stmt(txn, request) == 0) {
-		txn_limbo_read_confirm(&txn_limbo, lsn);
+		if (is_confirm)
+			txn_limbo_read_confirm(&txn_limbo, lsn);
+		else
+			txn_limbo_read_rollback(&txn_limbo, lsn);
 		return 0;
 	} else {
 		return -1;
@@ -292,9 +301,10 @@ static int
 apply_row(struct xrow_header *row)
 {
 	struct request request;
-	if (row->type == IPROTO_CONFIRM) {
+	if (iproto_type_is_synchro_request(row->type)) {
 		request.header = row;
-		return process_confirm(&request);
+		return process_confirm_rollback(&request,
+						row->type == IPROTO_CONFIRM);
 	}
 	if (xrow_decode_dml(row, &request, dml_request_key_map(row->type)) != 0)
 		return -1;
@@ -317,7 +327,7 @@ apply_final_join_row(struct xrow_header *row)
 	 * Confirms are ignored during join. All the data master
 	 * sends us is valid.
 	 */
-	if (row->type == IPROTO_CONFIRM)
+	if (iproto_type_is_synchro_request(row->type))
 		return 0;
 	struct txn *txn = txn_begin();
 	if (txn == NULL)
@@ -746,6 +756,14 @@ static int
 applier_txn_rollback_cb(struct trigger *trigger, void *event)
 {
 	(void) trigger;
+	struct txn *txn = (struct txn *) event;
+	/*
+	 * Synchronous transaction rollback due to receiving a
+	 * ROLLBACK entry is a normal event and requires no
+	 * special handling.
+	 */
+	if (txn->signature == TXN_SIGNATURE_SYNC_ROLLBACK)
+		return 0;
 
 	/*
 	 * Setup shared applier diagnostic area.
diff --git a/src/box/box.cc b/src/box/box.cc
index ba7347367..d6ef6351b 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -343,7 +343,7 @@ apply_wal_row(struct xstream *stream, struct xrow_header *row)
 {
 	struct request request;
 	// TODO: process confirmation during recovery.
-	if (row->type == IPROTO_CONFIRM)
+	if (iproto_type_is_synchro_request(row->type))
 		return;
 	xrow_decode_dml_xc(row, &request, dml_request_key_map(row->type));
 	if (request.type != IPROTO_NOP) {
diff --git a/src/box/errcode.h b/src/box/errcode.h
index 3ba6866e5..ea521aa07 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -268,6 +268,8 @@ struct errcode_record {
 	/*213 */_(ER_NO_SUCH_SESSION_SETTING,	"Session setting %s doesn't exist") \
 	/*214 */_(ER_UNCOMMITTED_FOREIGN_SYNC_TXNS, "Found uncommitted sync transactions from other instance with id %u") \
 	/*215 */_(ER_SYNC_MASTER_MISMATCH,	"CONFIRM message arrived for an unknown master id %d, expected %d") \
+        /*216 */_(ER_SYNC_QUORUM_TIMEOUT,       "Quorum collection for a synchronous transaction is timed out") \
+        /*217 */_(ER_SYNC_ROLLBACK,             "A rollback for a synchronous transaction is received") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 0adc9fc98..29588b6ca 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -772,7 +772,7 @@ relay_send_row(struct xstream *stream, struct xrow_header *packet)
 {
 	struct relay *relay = container_of(stream, struct relay, stream);
 	assert(iproto_type_is_dml(packet->type) ||
-	       packet->type == IPROTO_CONFIRM);
+	       iproto_type_is_synchro_request(packet->type));
 	if (packet->group_id == GROUP_LOCAL) {
 		/*
 		 * We do not relay replica-local rows to other
diff --git a/src/box/txn.c b/src/box/txn.c
index 612cd19bc..37955752a 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -83,10 +83,10 @@ txn_add_redo(struct txn *txn, struct txn_stmt *stmt, struct request *request)
 	struct space *space = stmt->space;
 	row->group_id = space != NULL ? space_group_id(space) : 0;
 	/*
-	 * IPROTO_CONFIRM entries are supplementary and aren't
-	 * valid dml requests. They're encoded manually.
+	 * Sychronous replication entries are supplementary and
+	 * aren't valid dml requests. They're encoded manually.
 	 */
-	if (likely(row->type != IPROTO_CONFIRM))
+	if (likely(!iproto_type_is_synchro_request(row->type)))
 		row->bodycnt = xrow_encode_dml(request, &txn->region, row->body);
 	if (row->bodycnt < 0)
 		return -1;
@@ -490,6 +490,14 @@ void
 txn_complete_async(struct journal_entry *entry)
 {
 	struct txn *txn = entry->complete_data;
+	/*
+	 * txn_limbo has already rolled the tx back, so we just
+	 * have to free it.
+	 */
+	if (txn->signature < TXN_SIGNATURE_ROLLBACK) {
+		txn_free(txn);
+		return;
+	}
 	txn->signature = entry->res;
 	/*
 	 * Some commit/rollback triggers require for in_txn fiber
@@ -765,7 +773,10 @@ txn_commit(struct txn *txn)
 	if (is_sync) {
 		txn_limbo_assign_lsn(&txn_limbo, limbo_entry,
 				     req->rows[req->n_rows - 1]->lsn);
-		txn_limbo_wait_complete(&txn_limbo, limbo_entry);
+		if (txn_limbo_wait_complete(&txn_limbo, limbo_entry) < 0) {
+			txn_free(txn);
+			return -1;
+		}
 	}
 	if (!txn_has_flag(txn, TXN_IS_DONE)) {
 		txn->signature = req->res;
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index ac57fd1bd..680e81d3d 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -84,6 +84,16 @@ txn_limbo_remove(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 	rlist_del_entry(entry, in_queue);
 }
 
+static inline void
+txn_limbo_pop(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
+{
+	assert(!rlist_empty(&entry->in_queue));
+	assert(rlist_last_entry(&limbo->queue, struct txn_limbo_entry,
+				 in_queue) == entry);
+	(void) limbo;
+	rlist_del_entry(entry, in_queue);
+}
+
 void
 txn_limbo_abort(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 {
@@ -118,7 +128,11 @@ txn_limbo_check_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 	return entry->is_commit;
 }
 
-void
+static int
+txn_limbo_write_rollback(struct txn_limbo *limbo,
+			 struct txn_limbo_entry *entry);
+
+int
 txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 {
 	struct txn *txn = entry->txn;
@@ -127,33 +141,64 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 	assert(txn_has_flag(txn, TXN_WAIT_ACK));
 	if (txn_limbo_check_complete(limbo, entry)) {
 		txn_limbo_remove(limbo, entry);
-		return;
+		return 0;
 	}
 	bool cancellable = fiber_set_cancellable(false);
 	bool timed_out = fiber_yield_timeout(txn_limbo_confirm_timeout(limbo));
 	fiber_set_cancellable(cancellable);
 	if (timed_out) {
-		// TODO: implement rollback.
-		entry->is_rollback = true;
+		txn_limbo_write_rollback(limbo, entry);
+		struct txn_limbo_entry *e, *tmp;
+		rlist_foreach_entry_safe_reverse(e, &limbo->queue,
+						 in_queue, tmp) {
+			e->is_rollback = true;
+			e->txn->signature = TXN_SIGNATURE_QUORUM_TIMEOUT;
+			txn_limbo_pop(limbo, e);
+			txn_clear_flag(e->txn, TXN_WAIT_ACK);
+			txn_complete(e->txn);
+			if (e == entry)
+				break;
+			fiber_wakeup(e->txn->fiber);
+		}
+		diag_set(ClientError, ER_SYNC_QUORUM_TIMEOUT);
+		return -1;
 	}
 	assert(txn_limbo_entry_is_complete(entry));
+	/*
+	 * The first tx to be rolled back already performed all
+	 * the necessary cleanups for us.
+	 */
+	if (entry->is_rollback) {
+		diag_set(ClientError, ER_SYNC_ROLLBACK);
+		return -1;
+	}
 	txn_limbo_remove(limbo, entry);
 	txn_clear_flag(txn, TXN_WAIT_ACK);
+	return 0;
 }
 
-/**
- * Write a confirmation entry to WAL. After it's written all the
- * transactions waiting for confirmation may be finished.
- */
 static int
-txn_limbo_write_confirm(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
+txn_limbo_write_confirm_rollback(struct txn_limbo *limbo,
+				 struct txn_limbo_entry *entry,
+				 bool is_confirm)
 {
 	struct xrow_header row;
 	struct request request = {
 		.header = &row,
 	};
 
-	if (xrow_encode_confirm(&row, limbo->instance_id, entry->lsn) < 0)
+	int res = 0;
+	if (is_confirm) {
+		res = xrow_encode_confirm(&row, limbo->instance_id, entry->lsn);
+	} else {
+		/*
+		 * This entry is the first to be rolled back, so
+		 * the last "safe" lsn is entry->lsn - 1.
+		 */
+		res = xrow_encode_rollback(&row, limbo->instance_id,
+					   entry->lsn - 1);
+	}
+	if (res == -1)
 		return -1;
 
 	struct txn *txn = txn_begin();
@@ -171,6 +216,17 @@ rollback:
 	return -1;
 }
 
+/**
+ * Write a confirmation entry to WAL. After it's written all the
+ * transactions waiting for confirmation may be finished.
+ */
+static int
+txn_limbo_write_confirm(struct txn_limbo *limbo,
+			struct txn_limbo_entry *entry)
+{
+	return txn_limbo_write_confirm_rollback(limbo, entry, true);
+}
+
 void
 txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
 {
@@ -194,6 +250,49 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
 	}
 }
 
+/**
+ * Write a rollback message to WAL. After it's written
+ * all the tarnsactions following the current one and waiting
+ * for confirmation must be rolled back.
+ */
+static int
+txn_limbo_write_rollback(struct txn_limbo *limbo,
+			 struct txn_limbo_entry *entry)
+{
+	return txn_limbo_write_confirm_rollback(limbo, entry, false);
+}
+
+void
+txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
+{
+	assert(limbo->instance_id != REPLICA_ID_NIL &&
+	       limbo->instance_id != instance_id);
+	struct txn_limbo_entry *e, *tmp;
+	rlist_foreach_entry_safe_reverse(e, &limbo->queue, in_queue, tmp) {
+		if (e->lsn <= lsn)
+			break;
+		e->is_rollback = true;
+		txn_limbo_pop(limbo, e);
+		txn_clear_flag(e->txn, TXN_WAIT_ACK);
+		if (e->txn->signature >= 0) {
+			/* Rollback the transaction. */
+			e->txn->signature = TXN_SIGNATURE_SYNC_ROLLBACK;
+			txn_complete(e->txn);
+		} else {
+			/*
+			 * Rollback the transaction, but don't
+			 * free it yet. txn_complete_async() will
+			 * free it.
+			 */
+			e->txn->signature = TXN_SIGNATURE_SYNC_ROLLBACK;
+			struct fiber *fiber = e->txn->fiber;
+			e->txn->fiber = fiber();
+			txn_complete(e->txn);
+			e->txn->fiber = fiber;
+		}
+	}
+}
+
 void
 txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 {
@@ -217,7 +316,10 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 	}
 	if (last_quorum != NULL) {
 		if (txn_limbo_write_confirm(limbo, last_quorum) != 0) {
-			// TODO: rollback.
+			// TODO: what to do here?.
+			// We already failed writing the CONFIRM
+			// message. What are the chances we'll be
+			// able to write ROLLBACK?
 			return;
 		}
 		/*
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 94f224131..138093c7c 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -156,8 +156,12 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn);
 /**
  * Block the current fiber until the transaction in the limbo
  * entry is either committed or rolled back.
+ * If timeout is reached before acks are collected, the tx is
+ * rolled back as well as all the txs in the limbo following it.
+ * Returns -1 when rollback was performed and tx has to be freed.
+ *          0 when tx processing can go on.
  */
-void
+int
 txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 
 /**
@@ -166,6 +170,12 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 void
 txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn);
 
+/**
+ * Rollback all the entries  starting with given master's LSN.
+ */
+void
+txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn);
+
 /**
  * Return TRUE if limbo is empty.
  */
diff --git a/test/box/error.result b/test/box/error.result
index 34ded3930..8241ec1a8 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -434,6 +434,8 @@ t;
  |   213: box.error.NO_SUCH_SESSION_SETTING
  |   214: box.error.UNCOMMITTED_FOREIGN_SYNC_TXNS
  |   215: box.error.SYNC_MASTER_MISMATCH
+ |   216: box.error.SYNC_QUORUM_TIMEOUT
+ |   217: box.error.SYNC_ROLLBACK
  | ...
 
 test_run:cmd("setopt delimiter ''");
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 11/19] box: rework local_recovery to use async txn_commit
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 01/19] replication: introduce space.is_sync option Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 10/19] txn_limbo: add ROLLBACK processing Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 12/19] replication: support ROLLBACK and CONFIRM during recovery Vladislav Shpilevoy
                     ` (24 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Serge Petrenko <sergepetrenko@tarantool.org>

Local recovery should use asynchronous txn commit procedure in order to
get to CONFIRM and ROLLBACK statements for a transaction that needs
confirmation before confirmation timeout happens.
Using async txn commit doesn't harm other transactions, since the
journal used during local recovery fakes writes and its write_async()
method may reuse plain write().

Follow-up #4847
Follow-up #4848
---
 src/box/box.cc | 41 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index d6ef6351b..32c69df76 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -118,6 +118,8 @@ static struct gc_checkpoint_ref backup_gc;
 static bool is_box_configured = false;
 static bool is_ro = true;
 static fiber_cond ro_cond;
+/** Set to true during recovery from local files. */
+static bool is_local_recovery = false;
 
 /**
  * The following flag is set if the instance failed to
@@ -206,7 +208,24 @@ box_process_rw(struct request *request, struct space *space,
 		goto rollback;
 
 	if (is_autocommit) {
-		if (txn_commit(txn) != 0)
+		int res = 0;
+		/*
+		 * During local recovery the commit procedure
+		 * should be async, otherwise the only fiber
+		 * processing recovery will get stuck on the first
+		 * synchronous tx it meets until confirm timeout
+		 * is reached and the tx is rolled back, yielding
+		 * an error.
+		 * Moreover, txn_commit_async() doesn't hurt at
+		 * all during local recovery, since journal_write
+		 * is faked at this stage and returns immediately.
+		 */
+		if (is_local_recovery) {
+			res = txn_commit_async(txn);
+		} else {
+			res = txn_commit(txn);
+		}
+		if (res < 0)
 			goto error;
 	        fiber_gc();
 	}
@@ -327,13 +346,25 @@ recovery_journal_write(struct journal *base,
 	return 0;
 }
 
+static int
+recovery_journal_write_async(struct journal *base,
+			     struct  journal_entry *entry)
+{
+	recovery_journal_write(base, entry);
+	/*
+	 * Since there're no actual writes, fire a
+	 * journal_async_complete callback right away.
+	 */
+	journal_async_complete(base, entry);
+	return 0;
+}
+
 static void
 recovery_journal_create(struct vclock *v)
 {
 	static struct recovery_journal journal;
-	journal_create(&journal.base, journal_no_write_async,
-		       journal_no_write_async_cb,
-		       recovery_journal_write);
+	journal_create(&journal.base, recovery_journal_write_async,
+		       txn_complete_async, recovery_journal_write);
 	journal.vclock = v;
 	journal_set(&journal.base);
 }
@@ -2315,6 +2346,7 @@ local_recovery(const struct tt_uuid *instance_uuid,
 	memtx = (struct memtx_engine *)engine_by_name("memtx");
 	assert(memtx != NULL);
 
+	is_local_recovery = true;
 	recovery_journal_create(&recovery->vclock);
 
 	/*
@@ -2356,6 +2388,7 @@ local_recovery(const struct tt_uuid *instance_uuid,
 		box_sync_replication(false);
 	}
 	recovery_finalize(recovery);
+	is_local_recovery = false;
 
 	/*
 	 * We must enable WAL before finalizing engine recovery,
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 12/19] replication: support ROLLBACK and CONFIRM during recovery
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (2 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 11/19] box: rework local_recovery to use async txn_commit Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 13/19] replication: add test for synchro CONFIRM/ROLLBACK Vladislav Shpilevoy
                     ` (23 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Serge Petrenko <sergepetrenko@tarantool.org>

Follow-up #4847
Follow-up #4848
---
 src/box/box.cc      | 20 ++++++++++++++++++--
 src/box/txn_limbo.c |  6 ++----
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 32c69df76..ad76f4f00 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -373,9 +373,25 @@ static void
 apply_wal_row(struct xstream *stream, struct xrow_header *row)
 {
 	struct request request;
-	// TODO: process confirmation during recovery.
-	if (iproto_type_is_synchro_request(row->type))
+	if (iproto_type_is_synchro_request(row->type)) {
+		uint32_t replica_id;
+		int64_t lsn;
+		switch(row->type) {
+		case IPROTO_CONFIRM:
+			if (xrow_decode_confirm(row, &replica_id, &lsn) < 0)
+				diag_raise();
+			assert(txn_limbo.instance_id == replica_id);
+			txn_limbo_read_confirm(&txn_limbo, lsn);
+			break;
+		case IPROTO_ROLLBACK:
+			if (xrow_decode_rollback(row, &replica_id, &lsn) < 0)
+				diag_raise();
+			assert(txn_limbo.instance_id == replica_id);
+			txn_limbo_read_rollback(&txn_limbo, lsn);
+			break;
+		}
 		return;
+	}
 	xrow_decode_dml_xc(row, &request, dml_request_key_map(row->type));
 	if (request.type != IPROTO_NOP) {
 		struct space *space = space_cache_find_xc(request.space_id);
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 680e81d3d..d3751a28b 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -230,8 +230,7 @@ txn_limbo_write_confirm(struct txn_limbo *limbo,
 void
 txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
 {
-	assert(limbo->instance_id != REPLICA_ID_NIL &&
-	       limbo->instance_id != instance_id);
+	assert(limbo->instance_id != REPLICA_ID_NIL);
 	struct txn_limbo_entry *e, *tmp;
 	rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
 		if (e->lsn > lsn)
@@ -265,8 +264,7 @@ txn_limbo_write_rollback(struct txn_limbo *limbo,
 void
 txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
 {
-	assert(limbo->instance_id != REPLICA_ID_NIL &&
-	       limbo->instance_id != instance_id);
+	assert(limbo->instance_id != REPLICA_ID_NIL);
 	struct txn_limbo_entry *e, *tmp;
 	rlist_foreach_entry_safe_reverse(e, &limbo->queue, in_queue, tmp) {
 		if (e->lsn <= lsn)
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 13/19] replication: add test for synchro CONFIRM/ROLLBACK
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (3 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 12/19] replication: support ROLLBACK and CONFIRM during recovery Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 14/19] applier: remove writer_cond Vladislav Shpilevoy
                     ` (22 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Serge Petrenko <sergepetrenko@tarantool.org>

Follow-up #4847
Follow-up #4848
---
 .../sync_replication_sanity.result            | 132 ++++++++++++++++++
 .../sync_replication_sanity.test.lua          |  52 +++++++
 2 files changed, 184 insertions(+)

diff --git a/test/replication/sync_replication_sanity.result b/test/replication/sync_replication_sanity.result
index 551df7daf..4b9823d77 100644
--- a/test/replication/sync_replication_sanity.result
+++ b/test/replication/sync_replication_sanity.result
@@ -69,3 +69,135 @@ box.schema.create_space('test', {is_sync = true, is_local = true})
  | ---
  | - error: 'Failed to create space ''test'': local space can''t be synchronous'
  | ...
+
+--
+-- gh-4847, gh-4848: CONFIRM and ROLLBACK entries in WAL.
+--
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+-- Set up synchronous replication options.
+quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+
+lsn = box.info.lsn
+ | ---
+ | ...
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+-- 1 for insertion, 1 for CONFIRM message.
+box.info.lsn - lsn
+ | ---
+ | - 2
+ | ...
+-- Raise quorum so that master has to issue a ROLLBACK.
+box.cfg{replication_synchro_quorum=3}
+ | ---
+ | ...
+t = fiber.time()
+ | ---
+ | ...
+box.space.sync:insert{2}
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+-- Check that master waited for acks.
+fiber.time() - t > box.cfg.replication_synchro_timeout
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=2}
+ | ---
+ | ...
+box.space.sync:insert{3}
+ | ---
+ | - [3]
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ |   - [3]
+ | ...
+
+-- Check consistency on replica.
+test_run:cmd('switch replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ |   - [3]
+ | ...
+
+-- Check consistency in recovered data.
+test_run:cmd('restart server replica')
+ | 
+box.space.sync:select{}
+ | ---
+ | - - [1]
+ |   - [3]
+ | ...
+
+-- Cleanup.
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+
+box.cfg{replication_synchro_quorum=quorum, replication_synchro_timeout=timeout}
+ | ---
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
diff --git a/test/replication/sync_replication_sanity.test.lua b/test/replication/sync_replication_sanity.test.lua
index fd7ed537e..8715a4600 100644
--- a/test/replication/sync_replication_sanity.test.lua
+++ b/test/replication/sync_replication_sanity.test.lua
@@ -27,3 +27,55 @@ s2:drop()
 
 -- Local space can't be synchronous.
 box.schema.create_space('test', {is_sync = true, is_local = true})
+
+--
+-- gh-4847, gh-4848: CONFIRM and ROLLBACK entries in WAL.
+--
+env = require('test_run')
+test_run = env.new()
+fiber = require('fiber')
+engine = test_run:get_cfg('engine')
+
+box.schema.user.grant('guest', 'replication')
+-- Set up synchronous replication options.
+quorum = box.cfg.replication_synchro_quorum
+timeout = box.cfg.replication_synchro_timeout
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+
+lsn = box.info.lsn
+box.space.sync:insert{1}
+-- 1 for insertion, 1 for CONFIRM message.
+box.info.lsn - lsn
+-- Raise quorum so that master has to issue a ROLLBACK.
+box.cfg{replication_synchro_quorum=3}
+t = fiber.time()
+box.space.sync:insert{2}
+-- Check that master waited for acks.
+fiber.time() - t > box.cfg.replication_synchro_timeout
+box.cfg{replication_synchro_quorum=2}
+box.space.sync:insert{3}
+box.space.sync:select{}
+
+-- Check consistency on replica.
+test_run:cmd('switch replica')
+box.space.sync:select{}
+
+-- Check consistency in recovered data.
+test_run:cmd('restart server replica')
+box.space.sync:select{}
+
+-- Cleanup.
+test_run:cmd('switch default')
+
+box.cfg{replication_synchro_quorum=quorum, replication_synchro_timeout=timeout}
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.space.sync:drop()
+box.schema.user.revoke('guest', 'replication')
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 14/19] applier: remove writer_cond
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (4 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 13/19] replication: add test for synchro CONFIRM/ROLLBACK Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-07-02  9:13     ` Serge Petrenko
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 15/19] applier: send heartbeat not only on commit, but on any write Vladislav Shpilevoy
                     ` (21 subsequent siblings)
  27 siblings, 1 reply; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Writer condition variable was used by the writer fiber to be woken
up when it is time to send a heartbeat or an ACK.

However it is not really needed, because writer fiber pointer is
always available in the same structure as writer_cond, and can be
used to call fiber_wakeup() directly.

Note, fiber_cond_signal() is basically the same fiber_wakeup().

The patch is not just refactoring for nothing. It is a
prerequisite for #5100. In this issue it will be needed to wakeup
the applier's writer fiber directly on each WAL write from txn.c
module. So the writer_cond won't be available. The only usable
thing will be txn->fiber, which will be set to applier's writer.

Part of #5100
---
 src/box/applier.cc | 11 ++++-------
 src/box/applier.h  |  2 --
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index fbb452dc0..635a9849c 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -155,11 +155,9 @@ applier_writer_f(va_list ap)
 		 * replication_timeout seconds any more.
 		 */
 		if (applier->version_id >= version_id(1, 7, 7))
-			fiber_cond_wait_timeout(&applier->writer_cond,
-						TIMEOUT_INFINITY);
+			fiber_yield_timeout(TIMEOUT_INFINITY);
 		else
-			fiber_cond_wait_timeout(&applier->writer_cond,
-						replication_timeout);
+			fiber_yield_timeout(replication_timeout);
 		/*
 		 * A writer fiber is going to be awaken after a commit or
 		 * a heartbeat message. So this is an appropriate place to
@@ -928,7 +926,7 @@ applier_on_commit(struct trigger *trigger, void *event)
 {
 	(void) event;
 	struct applier *applier = (struct applier *)trigger->data;
-	fiber_cond_signal(&applier->writer_cond);
+	fiber_wakeup(applier->writer);
 	return 0;
 }
 
@@ -1093,7 +1091,7 @@ applier_subscribe(struct applier *applier)
 		 */
 		if (stailq_first_entry(&rows, struct applier_tx_row,
 				       next)->row.lsn == 0)
-			fiber_cond_signal(&applier->writer_cond);
+			fiber_wakeup(applier->writer);
 		else if (applier_apply_tx(&rows) != 0)
 			diag_raise();
 
@@ -1289,7 +1287,6 @@ applier_new(const char *uri)
 	applier->last_row_time = ev_monotonic_now(loop());
 	rlist_create(&applier->on_state);
 	fiber_cond_create(&applier->resume_cond);
-	fiber_cond_create(&applier->writer_cond);
 	diag_create(&applier->diag);
 
 	return applier;
diff --git a/src/box/applier.h b/src/box/applier.h
index c9fdc2955..4f8bee84e 100644
--- a/src/box/applier.h
+++ b/src/box/applier.h
@@ -78,8 +78,6 @@ struct applier {
 	struct fiber *reader;
 	/** Background fiber to reply with vclock */
 	struct fiber *writer;
-	/** Writer cond. */
-	struct fiber_cond writer_cond;
 	/** Finite-state machine */
 	enum applier_state state;
 	/** Local time of this replica when the last row has been received */
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 15/19] applier: send heartbeat not only on commit, but on any write
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (5 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 14/19] applier: remove writer_cond Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-07-01 23:55     ` Vladislav Shpilevoy
  2020-07-03 12:23     ` Serge Petrenko
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 16/19] txn_limbo: add diag_set in txn_limbo_wait_confirm Vladislav Shpilevoy
                     ` (20 subsequent siblings)
  27 siblings, 2 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Concept of 'commit' becomes not 100% matching WAL write event,
when synchro replication comes.

And yet applier relied on commit event when sent periodic
hearbeats to tell the master the replica's new vclock.

The patch makes applier send heartbeats on any write event. Even
if it was not commit. For example, when a sync transaction's
data was written, and the replica needs to tell the master ACK
using the heartbeat.

Closes #5100
---
 src/box/applier.cc                            | 25 +++++++-
 .../sync_replication_sanity.result            | 59 ++++++++++++++++++-
 .../sync_replication_sanity.test.lua          | 32 +++++++++-
 3 files changed, 107 insertions(+), 9 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 635a9849c..a9baf0d69 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -755,6 +755,11 @@ applier_txn_rollback_cb(struct trigger *trigger, void *event)
 {
 	(void) trigger;
 	struct txn *txn = (struct txn *) event;
+	/*
+	 * Let the txn module free the transaction object. It is
+	 * not needed for anything else.
+	 */
+	txn->fiber = NULL;
 	/*
 	 * Synchronous transaction rollback due to receiving a
 	 * ROLLBACK entry is a normal event and requires no
@@ -791,6 +796,14 @@ static int
 applier_txn_commit_cb(struct trigger *trigger, void *event)
 {
 	(void) trigger;
+	struct txn *txn = (struct txn *)event;
+	assert(txn->fiber != NULL);
+	assert(strncmp(txn->fiber->name, "applierw", 8) == 0);
+	/*
+	 * Let the txn module free the transaction object. It is
+	 * not needed for anything else.
+	 */
+	txn->fiber = NULL;
 	/* Broadcast the commit event across all appliers. */
 	trigger_run(&replicaset.applier.on_commit, event);
 	return 0;
@@ -802,7 +815,7 @@ applier_txn_commit_cb(struct trigger *trigger, void *event)
  * Return 0 for success or -1 in case of an error.
  */
 static int
-applier_apply_tx(struct stailq *rows)
+applier_apply_tx(struct stailq *rows, struct fiber *writer)
 {
 	struct xrow_header *first_row = &stailq_first_entry(rows,
 					struct applier_tx_row, next)->row;
@@ -894,7 +907,13 @@ applier_apply_tx(struct stailq *rows)
 
 	trigger_create(on_commit, applier_txn_commit_cb, NULL, NULL);
 	txn_on_commit(txn, on_commit);
-
+	/*
+	 * Wakeup the writer fiber after the transaction is
+	 * completed. To send ACK to the master. In case of async
+	 * transaction it is the same as commit event. In case of
+	 * sync it happens after the data is written to WAL.
+	 */
+	txn->fiber = writer;
 	if (txn_commit_async(txn) < 0)
 		goto fail;
 
@@ -1092,7 +1111,7 @@ applier_subscribe(struct applier *applier)
 		if (stailq_first_entry(&rows, struct applier_tx_row,
 				       next)->row.lsn == 0)
 			fiber_wakeup(applier->writer);
-		else if (applier_apply_tx(&rows) != 0)
+		else if (applier_apply_tx(&rows, applier->writer) != 0)
 			diag_raise();
 
 		if (ibuf_used(ibuf) == 0)
diff --git a/test/replication/sync_replication_sanity.result b/test/replication/sync_replication_sanity.result
index 4b9823d77..8b37ba6f5 100644
--- a/test/replication/sync_replication_sanity.result
+++ b/test/replication/sync_replication_sanity.result
@@ -90,10 +90,10 @@ box.schema.user.grant('guest', 'replication')
  | ---
  | ...
 -- Set up synchronous replication options.
-quorum = box.cfg.replication_synchro_quorum
+old_synchro_quorum = box.cfg.replication_synchro_quorum
  | ---
  | ...
-timeout = box.cfg.replication_synchro_timeout
+old_synchro_timeout = box.cfg.replication_synchro_timeout
  | ---
  | ...
 box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
@@ -178,13 +178,63 @@ box.space.sync:select{}
  |   - [3]
  | ...
 
+--
+-- gh-5100: replica should send ACKs for sync transactions after
+-- WAL write immediately, not waiting for replication timeout or
+-- a CONFIRM.
+--
+box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
+ | ---
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+old_timeout = box.cfg.replication_timeout
+ | ---
+ | ...
+box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
+ | ---
+ | ...
+-- Commit something non-sync. So as applier writer fiber would
+-- flush the pending heartbeat and go to sleep with the new huge
+-- replication timeout.
+s = box.schema.create_space('test')
+ | ---
+ | ...
+pk = s:create_index('pk')
+ | ---
+ | ...
+s:replace{1}
+ | ---
+ | - [1]
+ | ...
+-- Now commit something sync. It should return immediately even
+-- though the replication timeout is huge.
+box.space.sync:replace{4}
+ | ---
+ | - [4]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{4}
+ | ---
+ | - - [4]
+ | ...
+
 -- Cleanup.
 test_run:cmd('switch default')
  | ---
  | - true
  | ...
 
-box.cfg{replication_synchro_quorum=quorum, replication_synchro_timeout=timeout}
+box.cfg{                                                                        \
+    replication_synchro_quorum = old_synchro_quorum,                            \
+    replication_synchro_timeout = old_synchro_timeout,                          \
+    replication_timeout = old_timeout,                                          \
+}
  | ---
  | ...
 test_run:cmd('stop server replica')
@@ -195,6 +245,9 @@ test_run:cmd('delete server replica')
  | ---
  | - true
  | ...
+box.space.test:drop()
+ | ---
+ | ...
 box.space.sync:drop()
  | ---
  | ...
diff --git a/test/replication/sync_replication_sanity.test.lua b/test/replication/sync_replication_sanity.test.lua
index 8715a4600..b0326fd4b 100644
--- a/test/replication/sync_replication_sanity.test.lua
+++ b/test/replication/sync_replication_sanity.test.lua
@@ -38,8 +38,8 @@ engine = test_run:get_cfg('engine')
 
 box.schema.user.grant('guest', 'replication')
 -- Set up synchronous replication options.
-quorum = box.cfg.replication_synchro_quorum
-timeout = box.cfg.replication_synchro_timeout
+old_synchro_quorum = box.cfg.replication_synchro_quorum
+old_synchro_timeout = box.cfg.replication_synchro_timeout
 box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
 
 test_run:cmd('create server replica with rpl_master=default,\
@@ -71,11 +71,37 @@ box.space.sync:select{}
 test_run:cmd('restart server replica')
 box.space.sync:select{}
 
+--
+-- gh-5100: replica should send ACKs for sync transactions after
+-- WAL write immediately, not waiting for replication timeout or
+-- a CONFIRM.
+--
+box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
+test_run:switch('default')
+old_timeout = box.cfg.replication_timeout
+box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
+-- Commit something non-sync. So as applier writer fiber would
+-- flush the pending heartbeat and go to sleep with the new huge
+-- replication timeout.
+s = box.schema.create_space('test')
+pk = s:create_index('pk')
+s:replace{1}
+-- Now commit something sync. It should return immediately even
+-- though the replication timeout is huge.
+box.space.sync:replace{4}
+test_run:switch('replica')
+box.space.sync:select{4}
+
 -- Cleanup.
 test_run:cmd('switch default')
 
-box.cfg{replication_synchro_quorum=quorum, replication_synchro_timeout=timeout}
+box.cfg{                                                                        \
+    replication_synchro_quorum = old_synchro_quorum,                            \
+    replication_synchro_timeout = old_synchro_timeout,                          \
+    replication_timeout = old_timeout,                                          \
+}
 test_run:cmd('stop server replica')
 test_run:cmd('delete server replica')
+box.space.test:drop()
 box.space.sync:drop()
 box.schema.user.revoke('guest', 'replication')
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 16/19] txn_limbo: add diag_set in txn_limbo_wait_confirm
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (6 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 15/19] applier: send heartbeat not only on commit, but on any write Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 17/19] replication: delay initial join until confirmation Vladislav Shpilevoy
                     ` (19 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Serge Petrenko <sergepetrenko@tarantool.org>

Add failure reason to txn_limbo_wait_confirm

Prerequisite #5097
---
 src/box/txn_limbo.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index d3751a28b..abea26731 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -404,10 +404,12 @@ txn_limbo_wait_confirm(struct txn_limbo *limbo)
 		/* Clear the triggers if the timeout has been reached. */
 		trigger_clear(&on_complete);
 		trigger_clear(&on_rollback);
+		diag_set(ClientError, ER_SYNC_QUORUM_TIMEOUT);
 		return -1;
 	}
 	if (!cwp.is_confirm) {
 		/* The transaction has been rolled back. */
+		diag_set(ClientError, ER_SYNC_ROLLBACK);
 		return -1;
 	}
 	return 0;
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 17/19] replication: delay initial join until confirmation
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (7 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 16/19] txn_limbo: add diag_set in txn_limbo_wait_confirm Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 18/19] replication: only send confirmed data during final join Vladislav Shpilevoy
                     ` (18 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Serge Petrenko <sergepetrenko@tarantool.org>

All the data that master sends during the join stage (both initial and
final) is embedded into the first snapshot created on replica, so this
data mustn't contain any unconfirmed or rolled back synchronous
transactions.

Make sure that master starts sending the initial data, which contains a
snapshot-like dump of all the spaces only after the latest synchronous
tx it has is confirmed. In case of rollback, the replica may retry
joining.

Part of #5097
---
 src/box/applier.cc | 9 +++++++++
 src/box/relay.cc   | 7 +++++++
 2 files changed, 16 insertions(+)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index a9baf0d69..7e63dc544 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -1194,6 +1194,15 @@ applier_f(va_list ap)
 				applier_log_error(applier, e);
 				applier_disconnect(applier, APPLIER_LOADING);
 				goto reconnect;
+			} else if (e->errcode() == ER_SYNC_QUORUM_TIMEOUT ||
+				   e->errcode() == ER_SYNC_ROLLBACK) {
+				/*
+				 * Join failure due to synchronous
+				 * transaction rollback.
+				 */
+				applier_log_error(applier, e);
+				applier_disconnect(applier, APPLIER_LOADING);
+				goto reconnect;
 			} else if (e->errcode() == ER_CFG ||
 				   e->errcode() == ER_ACCESS_DENIED ||
 				   e->errcode() == ER_NO_SUCH_USER ||
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 29588b6ca..a7843a8c2 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -321,6 +321,13 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock)
 	if (wal_sync(vclock) != 0)
 		diag_raise();
 
+	/*
+	 * Start sending data only when the latest sync
+	 * transaction is confirmed.
+	 */
+	if (txn_limbo_wait_confirm(&txn_limbo) != 0)
+		diag_raise();
+
 	/* Respond to the JOIN request with the current vclock. */
 	struct xrow_header row;
 	xrow_encode_vclock_xc(&row, vclock);
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 18/19] replication: only send confirmed data during final join
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (8 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 17/19] replication: delay initial join until confirmation Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 19/19] replication: block async transactions when not empty limbo Vladislav Shpilevoy
                     ` (17 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Serge Petrenko <sergepetrenko@tarantool.org>

Final join (or register) stage is needed to deliver the replica its
_cluster registration. Since this stage is followed by a snapshot on
replica, the data received during this stage must be confirmed.

Make master check that there are no rollbacks for the data to be sent
during final join and that all the data is confirmed before final join
starts.

Closes #5097
---
 src/box/box.cc      | 33 +++++++++++++++++++++++++++++++++
 src/box/txn_limbo.c |  5 +++++
 src/box/txn_limbo.h | 35 +++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/src/box/box.cc b/src/box/box.cc
index ad76f4f00..37bf4ab3b 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1711,6 +1711,11 @@ box_process_register(struct ev_io *io, struct xrow_header *header)
 	say_info("registering replica %s at %s",
 		 tt_uuid_str(&instance_uuid), sio_socketname(io->fd));
 
+	/* See box_process_join() */
+	txn_limbo_start_recording(&txn_limbo);
+	auto limbo_guard = make_scoped_guard([&] {
+		txn_limbo_stop_recording(&txn_limbo);
+	});
 	struct vclock start_vclock;
 	vclock_copy(&start_vclock, &replicaset.vclock);
 
@@ -1726,6 +1731,14 @@ box_process_register(struct ev_io *io, struct xrow_header *header)
 	struct vclock stop_vclock;
 	vclock_copy(&stop_vclock, &replicaset.vclock);
 
+	if (txn_limbo_got_rollback(&txn_limbo))
+		tnt_raise(ClientError, ER_SYNC_ROLLBACK);
+	txn_limbo_stop_recording(&txn_limbo);
+	limbo_guard.is_active  = false;
+
+	if (txn_limbo_wait_confirm(&txn_limbo) != 0)
+		diag_raise();
+
 	/*
 	 * Feed replica with WALs in range
 	 * (start_vclock, stop_vclock) so that it gets its
@@ -1847,6 +1860,18 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
 	say_info("joining replica %s at %s",
 		 tt_uuid_str(&instance_uuid), sio_socketname(io->fd));
 
+	/*
+	 * In order to join a replica, master has to make sure it
+	 * doesn't send unconfirmed data. We have to check that
+	 * there are no rolled back transactions between
+	 * start_vclock and stop_vclock, and that the data right
+	 * before stop_vclock is confirmed, before we can proceed
+	 * to final join.
+	 */
+	txn_limbo_start_recording(&txn_limbo);
+	auto limbo_guard = make_scoped_guard([&] {
+		txn_limbo_stop_recording(&txn_limbo);
+	});
 	/*
 	 * Initial stream: feed replica with dirty data from engines.
 	 */
@@ -1868,6 +1893,14 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
 	struct vclock stop_vclock;
 	vclock_copy(&stop_vclock, &replicaset.vclock);
 
+	if (txn_limbo_got_rollback(&txn_limbo))
+		tnt_raise(ClientError, ER_SYNC_ROLLBACK);
+	txn_limbo_stop_recording(&txn_limbo);
+	limbo_guard.is_active  = false;
+
+	if (txn_limbo_wait_confirm(&txn_limbo) != 0)
+		diag_raise();
+
 	/* Send end of initial stage data marker */
 	struct xrow_header row;
 	xrow_encode_vclock_xc(&row, &stop_vclock);
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index abea26731..fbe4dcecf 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -40,6 +40,8 @@ txn_limbo_create(struct txn_limbo *limbo)
 	rlist_create(&limbo->queue);
 	limbo->instance_id = REPLICA_ID_NIL;
 	vclock_create(&limbo->vclock);
+	limbo->is_recording = false;
+	limbo->got_rollback = false;
 }
 
 struct txn_limbo_entry *
@@ -90,8 +92,11 @@ txn_limbo_pop(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 	assert(!rlist_empty(&entry->in_queue));
 	assert(rlist_last_entry(&limbo->queue, struct txn_limbo_entry,
 				 in_queue) == entry);
+	assert(entry->is_rollback);
 	(void) limbo;
 	rlist_del_entry(entry, in_queue);
+	if (limbo->is_recording)
+		limbo->got_rollback = true;
 }
 
 void
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 138093c7c..a9fc83b0c 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -117,6 +117,13 @@ struct txn_limbo {
 	 * transactions, created on the limbo's owner node.
 	 */
 	struct vclock vclock;
+	/** Set to true when limbo records rollback occurrence. */
+	bool is_recording;
+	/**
+	 * Whether any rollbacks happened during the recording
+	 * period.
+	 */
+	bool got_rollback;
 };
 
 /**
@@ -126,6 +133,34 @@ struct txn_limbo {
  */
 extern struct txn_limbo txn_limbo;
 
+/**
+ * Make limbo remember the occurrence of rollbacks due to failed
+ * quorum collection.
+ */
+static inline void
+txn_limbo_start_recording(struct txn_limbo *limbo)
+{
+	limbo->is_recording = true;
+}
+
+/** Stop the recording of failed quorum collection events. */
+static inline void
+txn_limbo_stop_recording(struct txn_limbo *limbo)
+{
+	limbo->is_recording = false;
+	limbo->got_rollback = false;
+}
+
+/**
+ * Returns true in case the limbo rolled back any tx since the
+ * moment txn_limbo_start_recording() was called.
+ */
+static inline bool
+txn_limbo_got_rollback(struct txn_limbo *limbo)
+{
+	return limbo->got_rollback;
+}
+
 /**
  * Allocate, create, and append a new transaction to the limbo.
  * The limbo entry is allocated on the transaction's region.
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 19/19] replication: block async transactions when not empty limbo
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (9 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 18/19] replication: only send confirmed data during final join Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-07-01 17:12     ` Sergey Ostanevich
  2020-07-03 12:28     ` Serge Petrenko
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 02/19] replication: introduce replication_synchro_* cfg options Vladislav Shpilevoy
                     ` (16 subsequent siblings)
  27 siblings, 2 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

When there is a not committed synchronous transaction, any attempt
to commit a next transaction should be suspended, even if it is an
async transaction.

This restriction comes from the theoretically possible dependency
of what is written in the async transactions on what was written
in the previous sync transactions.

For that there is a new txn flag - TXN_WAIT_SYNC. Previously the
only synchro replication flag was TXN_WAIT_ACK. And now a
transaction can be sync, but not wait for ACKs.

In particular, if a transaction:

- Is synchronous, the it has TXN_WAIT_SYNC (it is sync), and
  TXN_WAIT_ACK (need to collect ACKs, or get a CONFIRM);

- Is asynchronous, and the limbo was empty and the moment of
  commit, the it does not have any of these flags and committed
  like earlier;

- Is asynchronous, and the limbo was not empty and the moment of
  commit. Then it will have only TXN_WAIT_SYNC. So it will be
  finished right after all the previous sync transactions are
  done. Note: *without waiting for ACKs* - the transaction is
  still asynchronous in a sense that it is don't need to wait for
  quorum replication.

Follow-up #4845
---
 src/box/applier.cc                            |  8 ++
 src/box/txn.c                                 | 16 ++--
 src/box/txn.h                                 |  7 ++
 src/box/txn_limbo.c                           | 49 +++++++++---
 .../sync_replication_sanity.result            | 75 +++++++++++++++++++
 .../sync_replication_sanity.test.lua          | 26 +++++++
 test/unit/snap_quorum_delay.cc                |  6 +-
 7 files changed, 172 insertions(+), 15 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 7e63dc544..7e70211b7 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -280,6 +280,14 @@ process_confirm_rollback(struct request *request, bool is_confirm)
 			 txn_limbo.instance_id);
 		return -1;
 	}
+	assert(txn->n_applier_rows == 0);
+	/*
+	 * This is not really a transaction. It just uses txn API
+	 * to put the data into WAL. And obviously it should not
+	 * go to the limbo and block on the very same sync
+	 * transaction which it tries to confirm now.
+	 */
+	txn_set_flag(txn, TXN_FORCE_ASYNC);
 
 	if (txn_begin_stmt(txn, NULL) != 0)
 		return -1;
diff --git a/src/box/txn.c b/src/box/txn.c
index 37955752a..bc2bb8e11 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -442,7 +442,7 @@ txn_complete(struct txn *txn)
 			engine_rollback(txn->engine, txn);
 		if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
 			txn_run_rollback_triggers(txn, &txn->on_rollback);
-	} else if (!txn_has_flag(txn, TXN_WAIT_ACK)) {
+	} else if (!txn_has_flag(txn, TXN_WAIT_SYNC)) {
 		/* Commit the transaction. */
 		if (txn->engine != NULL)
 			engine_commit(txn->engine, txn);
@@ -552,8 +552,14 @@ txn_journal_entry_new(struct txn *txn)
 	 * space can't be synchronous. So if there is at least one
 	 * synchronous space, the transaction is not local.
 	 */
-	if (is_sync && !txn_has_flag(txn, TXN_FORCE_ASYNC))
-		txn_set_flag(txn, TXN_WAIT_ACK);
+	if (!txn_has_flag(txn, TXN_FORCE_ASYNC)) {
+		if (is_sync) {
+			txn_set_flag(txn, TXN_WAIT_SYNC);
+			txn_set_flag(txn, TXN_WAIT_ACK);
+		} else if (!txn_limbo_is_empty(&txn_limbo)) {
+			txn_set_flag(txn, TXN_WAIT_SYNC);
+		}
+	}
 
 	assert(remote_row == req->rows + txn->n_applier_rows);
 	assert(local_row == remote_row + txn->n_new_rows);
@@ -662,7 +668,7 @@ txn_commit_async(struct txn *txn)
 		return -1;
 	}
 
-	bool is_sync = txn_has_flag(txn, TXN_WAIT_ACK);
+	bool is_sync = txn_has_flag(txn, TXN_WAIT_SYNC);
 	struct txn_limbo_entry *limbo_entry;
 	if (is_sync) {
 		/*
@@ -737,7 +743,7 @@ txn_commit(struct txn *txn)
 		return -1;
 	}
 
-	bool is_sync = txn_has_flag(txn, TXN_WAIT_ACK);
+	bool is_sync = txn_has_flag(txn, TXN_WAIT_SYNC);
 	if (is_sync) {
 		/*
 		 * Remote rows, if any, come before local rows, so
diff --git a/src/box/txn.h b/src/box/txn.h
index c631d7033..c484fcb56 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -66,11 +66,18 @@ enum txn_flag {
 	TXN_CAN_YIELD,
 	/** on_commit and/or on_rollback list is not empty. */
 	TXN_HAS_TRIGGERS,
+	/**
+	 * A transaction is either synchronous itself and needs to
+	 * be synced with replicas, or it is async, but is blocked
+	 * by a not yet finished synchronous transaction.
+	 */
+	TXN_WAIT_SYNC,
 	/**
 	 * Transaction, touched sync spaces, enters 'waiting for
 	 * acks' state before commit. In this state it waits until
 	 * it is replicated onto a quorum of replicas, and only
 	 * then finishes commit and returns success to a user.
+	 * TXN_WAIT_SYNC is always set, if TXN_WAIT_ACK is set.
 	 */
 	TXN_WAIT_ACK,
 	/**
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index fbe4dcecf..bfb404e8e 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -47,7 +47,7 @@ txn_limbo_create(struct txn_limbo *limbo)
 struct txn_limbo_entry *
 txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
 {
-	assert(txn_has_flag(txn, TXN_WAIT_ACK));
+	assert(txn_has_flag(txn, TXN_WAIT_SYNC));
 	if (id == 0)
 		id = instance_id;
 	if (limbo->instance_id != id) {
@@ -143,7 +143,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 	struct txn *txn = entry->txn;
 	assert(entry->lsn > 0);
 	assert(!txn_has_flag(txn, TXN_IS_DONE));
-	assert(txn_has_flag(txn, TXN_WAIT_ACK));
+	assert(txn_has_flag(txn, TXN_WAIT_SYNC));
 	if (txn_limbo_check_complete(limbo, entry)) {
 		txn_limbo_remove(limbo, entry);
 		return 0;
@@ -160,6 +160,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 			e->txn->signature = TXN_SIGNATURE_QUORUM_TIMEOUT;
 			txn_limbo_pop(limbo, e);
 			txn_clear_flag(e->txn, TXN_WAIT_ACK);
+			txn_clear_flag(e->txn, TXN_WAIT_SYNC);
 			txn_complete(e->txn);
 			if (e == entry)
 				break;
@@ -179,6 +180,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 	}
 	txn_limbo_remove(limbo, entry);
 	txn_clear_flag(txn, TXN_WAIT_ACK);
+	txn_clear_flag(txn, TXN_WAIT_SYNC);
 	return 0;
 }
 
@@ -209,6 +211,13 @@ txn_limbo_write_confirm_rollback(struct txn_limbo *limbo,
 	struct txn *txn = txn_begin();
 	if (txn == NULL)
 		return -1;
+	/*
+	 * This is not really a transaction. It just uses txn API
+	 * to put the data into WAL. And obviously it should not
+	 * go to the limbo and block on the very same sync
+	 * transaction which it tries to confirm now.
+	 */
+	txn_set_flag(txn, TXN_FORCE_ASYNC);
 
 	if (txn_begin_stmt(txn, NULL) != 0)
 		goto rollback;
@@ -238,11 +247,21 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
 	assert(limbo->instance_id != REPLICA_ID_NIL);
 	struct txn_limbo_entry *e, *tmp;
 	rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
-		if (e->lsn > lsn)
+		/*
+		 * Confirm a transaction if
+		 * - it is a sync transaction covered by the
+		 *   confirmation LSN;
+		 * - it is an async transaction, and it is the
+		 *   last in the queue. So it does not depend on
+		 *   a not finished sync transaction anymore and
+		 *   can be confirmed too.
+		 */
+		if (e->lsn > lsn && txn_has_flag(e->txn, TXN_WAIT_ACK))
 			break;
 		e->is_commit = true;
 		txn_limbo_remove(limbo, e);
 		txn_clear_flag(e->txn, TXN_WAIT_ACK);
+		txn_clear_flag(e->txn, TXN_WAIT_SYNC);
 		/*
 		 * If  txn_complete_async() was already called,
 		 * finish tx processing. Otherwise just clear the
@@ -277,6 +296,7 @@ txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
 		e->is_rollback = true;
 		txn_limbo_pop(limbo, e);
 		txn_clear_flag(e->txn, TXN_WAIT_ACK);
+		txn_clear_flag(e->txn, TXN_WAIT_SYNC);
 		if (e->txn->signature >= 0) {
 			/* Rollback the transaction. */
 			e->txn->signature = TXN_SIGNATURE_SYNC_ROLLBACK;
@@ -307,15 +327,26 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 	struct txn_limbo_entry *e;
 	struct txn_limbo_entry *last_quorum = NULL;
 	rlist_foreach_entry(e, &limbo->queue, in_queue) {
-		if (e->lsn <= prev_lsn)
-			continue;
 		if (e->lsn > lsn)
 			break;
-		if (++e->ack_count >= replication_synchro_quorum) {
-			e->is_commit = true;
-			last_quorum = e;
-		}
+		if (e->lsn <= prev_lsn)
+			continue;
 		assert(e->ack_count <= VCLOCK_MAX);
+		/*
+		 * Sync transactions need to collect acks. Async
+		 * transactions are automatically committed right
+		 * after all the previous sync transactions are.
+		 */
+		if (txn_has_flag(e->txn, TXN_WAIT_ACK)) {
+			if (++e->ack_count < replication_synchro_quorum)
+				continue;
+		} else {
+			assert(txn_has_flag(e->txn, TXN_WAIT_SYNC));
+			if (last_quorum == NULL)
+				continue;
+		}
+		e->is_commit = true;
+		last_quorum = e;
 	}
 	if (last_quorum != NULL) {
 		if (txn_limbo_write_confirm(limbo, last_quorum) != 0) {
diff --git a/test/replication/sync_replication_sanity.result b/test/replication/sync_replication_sanity.result
index 8b37ba6f5..f713d4b08 100644
--- a/test/replication/sync_replication_sanity.result
+++ b/test/replication/sync_replication_sanity.result
@@ -224,6 +224,81 @@ box.space.sync:select{4}
  | - - [4]
  | ...
 
+--
+-- Async transactions should wait for existing sync transactions
+-- finish.
+--
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+-- Start 2 fibers, which will execute one right after the other
+-- in the same event loop iteration.
+f = fiber.create(box.space.sync.replace, box.space.sync, {5}) s:replace{5}
+ | ---
+ | ...
+f:status()
+ | ---
+ | - dead
+ | ...
+s:select{5}
+ | ---
+ | - - [5]
+ | ...
+box.space.sync:select{5}
+ | ---
+ | - - [5]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.test:select{5}
+ | ---
+ | - - [5]
+ | ...
+box.space.sync:select{5}
+ | ---
+ | - - [5]
+ | ...
+-- Ensure sync rollback will affect all pending async transactions
+-- too.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_timeout = 0.001, replication_synchro_quorum = 3}
+ | ---
+ | ...
+f = fiber.create(box.space.sync.replace, box.space.sync, {6}) s:replace{6}
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+f:status()
+ | ---
+ | - dead
+ | ...
+s:select{6}
+ | ---
+ | - []
+ | ...
+box.space.sync:select{6}
+ | ---
+ | - []
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.test:select{6}
+ | ---
+ | - []
+ | ...
+box.space.sync:select{6}
+ | ---
+ | - []
+ | ...
+
 -- Cleanup.
 test_run:cmd('switch default')
  | ---
diff --git a/test/replication/sync_replication_sanity.test.lua b/test/replication/sync_replication_sanity.test.lua
index b0326fd4b..f84b6ee19 100644
--- a/test/replication/sync_replication_sanity.test.lua
+++ b/test/replication/sync_replication_sanity.test.lua
@@ -92,6 +92,32 @@ box.space.sync:replace{4}
 test_run:switch('replica')
 box.space.sync:select{4}
 
+--
+-- Async transactions should wait for existing sync transactions
+-- finish.
+--
+test_run:switch('default')
+-- Start 2 fibers, which will execute one right after the other
+-- in the same event loop iteration.
+f = fiber.create(box.space.sync.replace, box.space.sync, {5}) s:replace{5}
+f:status()
+s:select{5}
+box.space.sync:select{5}
+test_run:switch('replica')
+box.space.test:select{5}
+box.space.sync:select{5}
+-- Ensure sync rollback will affect all pending async transactions
+-- too.
+test_run:switch('default')
+box.cfg{replication_synchro_timeout = 0.001, replication_synchro_quorum = 3}
+f = fiber.create(box.space.sync.replace, box.space.sync, {6}) s:replace{6}
+f:status()
+s:select{6}
+box.space.sync:select{6}
+test_run:switch('replica')
+box.space.test:select{6}
+box.space.sync:select{6}
+
 -- Cleanup.
 test_run:cmd('switch default')
 
diff --git a/test/unit/snap_quorum_delay.cc b/test/unit/snap_quorum_delay.cc
index 7a200673a..e6cf381bf 100644
--- a/test/unit/snap_quorum_delay.cc
+++ b/test/unit/snap_quorum_delay.cc
@@ -97,8 +97,12 @@ txn_process_func(va_list ap)
 	enum process_type process_type = (enum process_type)va_arg(ap, int);
 	struct txn *txn = txn_begin();
 	txn->fiber = fiber();
-	/* Set the TXN_WAIT_ACK flag to simulate a sync transaction.*/
+	/*
+	 * Set the TXN_WAIT_ACK + SYNC flags to simulate a sync
+	 * transaction.
+	 */
 	txn_set_flag(txn, TXN_WAIT_ACK);
+	txn_set_flag(txn, TXN_WAIT_SYNC);
 	/*
 	 * The true way to push the transaction to limbo is to call
 	 * txn_commit() for sync transaction. But, if txn_commit()
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 02/19] replication: introduce replication_synchro_* cfg options
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (10 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 19/19] replication: block async transactions when not empty limbo Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-07-01 16:05     ` Sergey Ostanevich
  2020-07-02  8:29     ` Serge Petrenko
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 03/19] txn: add TXN_WAIT_ACK flag Vladislav Shpilevoy
                     ` (15 subsequent siblings)
  27 siblings, 2 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Synchronous transactions are supposed to be replicated on a
specified number of replicas before committed on master. The
number of replicas can be specified using
replication_synchro_quorum option. It is 1 by default, so sync
transactions work like asynchronous when not configured anyhow.
1 means successful WAL write on master is enough for commit.

When replication_synchro_quorum is greater than 1, an instance has to
wait for the specified number of replicas to  reply with success. If
enough replies aren't collected during replication_synchro_timeout,
the instance rolls back the tx in question.

Part of #4844
Part of #5073
---
 src/box/box.cc                  | 53 +++++++++++++++++++++++++++++++++
 src/box/box.h                   |  2 ++
 src/box/lua/cfg.cc              | 18 +++++++++++
 src/box/lua/load_cfg.lua        | 10 +++++++
 src/box/replication.cc          |  2 ++
 src/box/replication.h           | 12 ++++++++
 test/app-tap/init_script.result |  2 ++
 test/box/admin.result           |  4 +++
 test/box/cfg.result             |  8 +++++
 9 files changed, 111 insertions(+)

diff --git a/src/box/box.cc b/src/box/box.cc
index 871b0d976..0821ea0a3 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -476,6 +476,31 @@ box_check_replication_sync_lag(void)
 	return lag;
 }
 
+static int
+box_check_replication_synchro_quorum(void)
+{
+	int quorum = cfg_geti("replication_synchro_quorum");
+	if (quorum <= 0 || quorum > VCLOCK_MAX) {
+		diag_set(ClientError, ER_CFG, "replication_synchro_quorum",
+			 "the value must be greater than zero and less than "
+			 "maximal number of replicas");
+		return -1;
+	}
+	return quorum;
+}
+
+static double
+box_check_replication_synchro_timeout(void)
+{
+	double timeout = cfg_getd("replication_synchro_timeout");
+	if (timeout <= 0) {
+		diag_set(ClientError, ER_CFG, "replication_synchro_timeout",
+			 "the value must be greater than zero");
+		return -1;
+	}
+	return timeout;
+}
+
 static double
 box_check_replication_sync_timeout(void)
 {
@@ -658,6 +683,10 @@ box_check_config()
 	box_check_replication_connect_timeout();
 	box_check_replication_connect_quorum();
 	box_check_replication_sync_lag();
+	if (box_check_replication_synchro_quorum() < 0)
+		diag_raise();
+	if (box_check_replication_synchro_timeout() < 0)
+		diag_raise();
 	box_check_replication_sync_timeout();
 	box_check_readahead(cfg_geti("readahead"));
 	box_check_checkpoint_count(cfg_geti("checkpoint_count"));
@@ -777,6 +806,26 @@ box_set_replication_sync_lag(void)
 	replication_sync_lag = box_check_replication_sync_lag();
 }
 
+int
+box_set_replication_synchro_quorum(void)
+{
+	int value = box_check_replication_synchro_quorum();
+	if (value < 0)
+		return -1;
+	replication_synchro_quorum = value;
+	return 0;
+}
+
+int
+box_set_replication_synchro_timeout(void)
+{
+	double value = box_check_replication_synchro_timeout();
+	if (value < 0)
+		return -1;
+	replication_synchro_timeout = value;
+	return 0;
+}
+
 void
 box_set_replication_sync_timeout(void)
 {
@@ -2417,6 +2466,10 @@ box_cfg_xc(void)
 	box_set_replication_connect_timeout();
 	box_set_replication_connect_quorum();
 	box_set_replication_sync_lag();
+	if (box_set_replication_synchro_quorum() != 0)
+		diag_raise();
+	if (box_set_replication_synchro_timeout() != 0)
+		diag_raise();
 	box_set_replication_sync_timeout();
 	box_set_replication_skip_conflict();
 	box_set_replication_anon();
diff --git a/src/box/box.h b/src/box/box.h
index 557542a83..f9789154e 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -243,6 +243,8 @@ void box_set_replication_timeout(void);
 void box_set_replication_connect_timeout(void);
 void box_set_replication_connect_quorum(void);
 void box_set_replication_sync_lag(void);
+int box_set_replication_synchro_quorum(void);
+int box_set_replication_synchro_timeout(void);
 void box_set_replication_sync_timeout(void);
 void box_set_replication_skip_conflict(void);
 void box_set_replication_anon(void);
diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc
index a5b15e527..d481155cd 100644
--- a/src/box/lua/cfg.cc
+++ b/src/box/lua/cfg.cc
@@ -313,6 +313,22 @@ lbox_cfg_set_replication_sync_lag(struct lua_State *L)
 	return 0;
 }
 
+static int
+lbox_cfg_set_replication_synchro_quorum(struct lua_State *L)
+{
+	if (box_set_replication_synchro_quorum() != 0)
+		luaT_error(L);
+	return 0;
+}
+
+static int
+lbox_cfg_set_replication_synchro_timeout(struct lua_State *L)
+{
+	if (box_set_replication_synchro_timeout() != 0)
+		luaT_error(L);
+	return 0;
+}
+
 static int
 lbox_cfg_set_replication_sync_timeout(struct lua_State *L)
 {
@@ -370,6 +386,8 @@ box_lua_cfg_init(struct lua_State *L)
 		{"cfg_set_replication_connect_quorum", lbox_cfg_set_replication_connect_quorum},
 		{"cfg_set_replication_connect_timeout", lbox_cfg_set_replication_connect_timeout},
 		{"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag},
+		{"cfg_set_replication_synchro_quorum", lbox_cfg_set_replication_synchro_quorum},
+		{"cfg_set_replication_synchro_timeout", lbox_cfg_set_replication_synchro_timeout},
 		{"cfg_set_replication_sync_timeout", lbox_cfg_set_replication_sync_timeout},
 		{"cfg_set_replication_skip_conflict", lbox_cfg_set_replication_skip_conflict},
 		{"cfg_set_replication_anon", lbox_cfg_set_replication_anon},
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index f2f2df6f8..a7f03c7d6 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -89,6 +89,8 @@ local default_cfg = {
     replication_timeout = 1,
     replication_sync_lag = 10,
     replication_sync_timeout = 300,
+    replication_synchro_quorum = 1,
+    replication_synchro_timeout = 5,
     replication_connect_timeout = 30,
     replication_connect_quorum = nil, -- connect all
     replication_skip_conflict = false,
@@ -164,6 +166,8 @@ local template_cfg = {
     replication_timeout = 'number',
     replication_sync_lag = 'number',
     replication_sync_timeout = 'number',
+    replication_synchro_quorum = 'number',
+    replication_synchro_timeout = 'number',
     replication_connect_timeout = 'number',
     replication_connect_quorum = 'number',
     replication_skip_conflict = 'boolean',
@@ -280,6 +284,8 @@ local dynamic_cfg = {
     replication_connect_quorum = private.cfg_set_replication_connect_quorum,
     replication_sync_lag    = private.cfg_set_replication_sync_lag,
     replication_sync_timeout = private.cfg_set_replication_sync_timeout,
+    replication_synchro_quorum = private.cfg_set_replication_synchro_quorum,
+    replication_synchro_timeout = private.cfg_set_replication_synchro_timeout,
     replication_skip_conflict = private.cfg_set_replication_skip_conflict,
     replication_anon        = private.cfg_set_replication_anon,
     instance_uuid           = check_instance_uuid,
@@ -313,6 +319,8 @@ local dynamic_cfg_order = {
     replication_timeout     = 150,
     replication_sync_lag    = 150,
     replication_sync_timeout    = 150,
+    replication_synchro_quorum  = 150,
+    replication_synchro_timeout = 150,
     replication_connect_timeout = 150,
     replication_connect_quorum  = 150,
     replication             = 200,
@@ -348,6 +356,8 @@ local dynamic_cfg_skip_at_load = {
     replication_connect_quorum = true,
     replication_sync_lag    = true,
     replication_sync_timeout = true,
+    replication_synchro_quorum = true,
+    replication_synchro_timeout = true,
     replication_skip_conflict = true,
     replication_anon        = true,
     wal_dir_rescan_delay    = true,
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 273a7cb66..01e9e876a 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -51,6 +51,8 @@ double replication_timeout = 1.0; /* seconds */
 double replication_connect_timeout = 30.0; /* seconds */
 int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL;
 double replication_sync_lag = 10.0; /* seconds */
+int replication_synchro_quorum = 1;
+double replication_synchro_timeout = 5.0; /* seconds */
 double replication_sync_timeout = 300.0; /* seconds */
 bool replication_skip_conflict = false;
 bool replication_anon = false;
diff --git a/src/box/replication.h b/src/box/replication.h
index 93a25c8a7..a081870f9 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -125,6 +125,18 @@ extern int replication_connect_quorum;
  */
 extern double replication_sync_lag;
 
+/**
+ * Minimal number of replicas which should ACK a synchronous
+ * transaction to be able to confirm it and commit.
+ */
+extern int replication_synchro_quorum;
+
+/**
+ * Time in seconds which the master node is able to wait for ACKs
+ * for a synchronous transaction until it is rolled back.
+ */
+extern double replication_synchro_timeout;
+
 /**
  * Max time to wait for appliers to synchronize before entering
  * the orphan mode.
diff --git a/test/app-tap/init_script.result b/test/app-tap/init_script.result
index 7c4454285..857f0c95f 100644
--- a/test/app-tap/init_script.result
+++ b/test/app-tap/init_script.result
@@ -30,6 +30,8 @@ replication_connect_timeout:30
 replication_skip_conflict:false
 replication_sync_lag:10
 replication_sync_timeout:300
+replication_synchro_quorum:1
+replication_synchro_timeout:5
 replication_timeout:1
 slab_alloc_factor:1.05
 sql_cache_size:5242880
diff --git a/test/box/admin.result b/test/box/admin.result
index d94da8c5d..ab3e80a97 100644
--- a/test/box/admin.result
+++ b/test/box/admin.result
@@ -81,6 +81,10 @@ cfg_filter(box.cfg)
     - 10
   - - replication_sync_timeout
     - 300
+  - - replication_synchro_quorum
+    - 1
+  - - replication_synchro_timeout
+    - 5
   - - replication_timeout
     - 1
   - - slab_alloc_factor
diff --git a/test/box/cfg.result b/test/box/cfg.result
index b41d54599..bdd210b09 100644
--- a/test/box/cfg.result
+++ b/test/box/cfg.result
@@ -69,6 +69,10 @@ cfg_filter(box.cfg)
  |     - 10
  |   - - replication_sync_timeout
  |     - 300
+ |   - - replication_synchro_quorum
+ |     - 1
+ |   - - replication_synchro_timeout
+ |     - 5
  |   - - replication_timeout
  |     - 1
  |   - - slab_alloc_factor
@@ -172,6 +176,10 @@ cfg_filter(box.cfg)
  |     - 10
  |   - - replication_sync_timeout
  |     - 300
+ |   - - replication_synchro_quorum
+ |     - 1
+ |   - - replication_synchro_timeout
+ |     - 5
  |   - - replication_timeout
  |     - 1
  |   - - slab_alloc_factor
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 03/19] txn: add TXN_WAIT_ACK flag
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (11 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 02/19] replication: introduce replication_synchro_* cfg options Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-07-01 17:14     ` Sergey Ostanevich
                       ` (2 more replies)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 04/19] replication: make sync transactions wait quorum Vladislav Shpilevoy
                     ` (14 subsequent siblings)
  27 siblings, 3 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

When a transaction touches a synchronous space, its commit
procedure changes. The transaction enters state of 'waiting for
acks' from replicas and from own master's WAL.

To denote the state there is a new transaction flag -
TXN_WAIT_ACK.

Part of #4844
---
 src/box/txn.c | 6 ++++++
 src/box/txn.h | 7 +++++++
 2 files changed, 13 insertions(+)

diff --git a/src/box/txn.c b/src/box/txn.c
index 123520166..edc1f5180 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -495,6 +495,7 @@ txn_journal_entry_new(struct txn *txn)
 
 	struct xrow_header **remote_row = req->rows;
 	struct xrow_header **local_row = req->rows + txn->n_applier_rows;
+	bool is_sync = false;
 
 	stailq_foreach_entry(stmt, &txn->stmts, next) {
 		if (stmt->has_triggers) {
@@ -506,6 +507,9 @@ txn_journal_entry_new(struct txn *txn)
 		if (stmt->row == NULL)
 			continue;
 
+		is_sync = is_sync || (stmt->space != NULL &&
+				      stmt->space->def->opts.is_sync);
+
 		if (stmt->row->replica_id == 0)
 			*local_row++ = stmt->row;
 		else
@@ -513,6 +517,8 @@ txn_journal_entry_new(struct txn *txn)
 
 		req->approx_len += xrow_approx_len(stmt->row);
 	}
+	if (is_sync)
+		txn_set_flag(txn, TXN_WAIT_ACK);
 
 	assert(remote_row == req->rows + txn->n_applier_rows);
 	assert(local_row == remote_row + txn->n_new_rows);
diff --git a/src/box/txn.h b/src/box/txn.h
index 3f6d79d5c..232cc07a8 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -66,6 +66,13 @@ enum txn_flag {
 	TXN_CAN_YIELD,
 	/** on_commit and/or on_rollback list is not empty. */
 	TXN_HAS_TRIGGERS,
+	/**
+	 * Transaction, touched sync spaces, enters 'waiting for
+	 * acks' state before commit. In this state it waits until
+	 * it is replicated onto a quorum of replicas, and only
+	 * then finishes commit and returns success to a user.
+	 */
+	TXN_WAIT_ACK,
 };
 
 enum {
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 04/19] replication: make sync transactions wait quorum
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (12 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 03/19] txn: add TXN_WAIT_ACK flag Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-30 23:00     ` Vladislav Shpilevoy
                       ` (2 more replies)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 05/19] xrow: introduce CONFIRM and ROLLBACK entries Vladislav Shpilevoy
                     ` (13 subsequent siblings)
  27 siblings, 3 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Synchronous transaction (which changes anything in a synchronous
space) before commit waits until it is replicated onto a quorum
of replicas.

So far all the 'synchronousness' is basically the same as the well
known 'wait_lsn' technique. With the exception, that the
transaction really is not committed until replicated.

Problem of wait_lsn is still present though, in case master
restarts. Because there is no a 'confirm' record in WAL telling
which transactions are replicated and can be applied.

Closes #4844
Closes #4845
---
 src/box/CMakeLists.txt |   1 +
 src/box/box.cc         |   2 +
 src/box/errcode.h      |   1 +
 src/box/relay.cc       |  11 +++
 src/box/txn.c          |  51 +++++++++++-
 src/box/txn_limbo.c    | 176 +++++++++++++++++++++++++++++++++++++++++
 src/box/txn_limbo.h    | 168 +++++++++++++++++++++++++++++++++++++++
 test/box/error.result  |   1 +
 8 files changed, 409 insertions(+), 2 deletions(-)
 create mode 100644 src/box/txn_limbo.c
 create mode 100644 src/box/txn_limbo.h

diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
index 63f98f6c8..b8b2689d2 100644
--- a/src/box/CMakeLists.txt
+++ b/src/box/CMakeLists.txt
@@ -169,6 +169,7 @@ add_library(box STATIC
     session.cc
     port.c
     txn.c
+    txn_limbo.c
     box.cc
     gc.c
     checkpoint_schedule.c
diff --git a/src/box/box.cc b/src/box/box.cc
index 0821ea0a3..02088ba01 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -59,6 +59,7 @@
 #include "index.h"
 #include "port.h"
 #include "txn.h"
+#include "txn_limbo.h"
 #include "user.h"
 #include "cfg.h"
 #include "coio.h"
@@ -2413,6 +2414,7 @@ box_init(void)
 	if (tuple_init(lua_hash) != 0)
 		diag_raise();
 
+	txn_limbo_init();
 	sequence_init();
 }
 
diff --git a/src/box/errcode.h b/src/box/errcode.h
index d1e4d02a9..019c582af 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -266,6 +266,7 @@ struct errcode_record {
 	/*211 */_(ER_WRONG_QUERY_ID,		"Prepared statement with id %u does not exist") \
 	/*212 */_(ER_SEQUENCE_NOT_STARTED,		"Sequence '%s' is not started") \
 	/*213 */_(ER_NO_SUCH_SESSION_SETTING,	"Session setting %s doesn't exist") \
+	/*214 */_(ER_UNCOMMITTED_FOREIGN_SYNC_TXNS, "Found uncommitted sync transactions from other instance with id %u") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 2ad02cb8a..36fc14b8c 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -53,6 +53,7 @@
 #include "xrow_io.h"
 #include "xstream.h"
 #include "wal.h"
+#include "txn_limbo.h"
 
 /**
  * Cbus message to send status updates from relay to tx thread.
@@ -399,6 +400,16 @@ tx_status_update(struct cmsg *msg)
 {
 	struct relay_status_msg *status = (struct relay_status_msg *)msg;
 	vclock_copy(&status->relay->tx.vclock, &status->vclock);
+	/*
+	 * Let pending synchronous transactions know, which of
+	 * them were successfully sent to the replica. Acks are
+	 * collected only by the transactions originator (which is
+	 * the single master in 100% so far).
+	 */
+	if (txn_limbo.instance_id == instance_id) {
+		txn_limbo_ack(&txn_limbo, status->relay->replica->id,
+			      vclock_get(&status->vclock, instance_id));
+	}
 	static const struct cmsg_hop route[] = {
 		{relay_status_update, NULL}
 	};
diff --git a/src/box/txn.c b/src/box/txn.c
index edc1f5180..6cfa98212 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -29,6 +29,7 @@
  * SUCH DAMAGE.
  */
 #include "txn.h"
+#include "txn_limbo.h"
 #include "engine.h"
 #include "tuple.h"
 #include "journal.h"
@@ -433,7 +434,7 @@ txn_complete(struct txn *txn)
 			engine_rollback(txn->engine, txn);
 		if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
 			txn_run_rollback_triggers(txn, &txn->on_rollback);
-	} else {
+	} else if (!txn_has_flag(txn, TXN_WAIT_ACK)) {
 		/* Commit the transaction. */
 		if (txn->engine != NULL)
 			engine_commit(txn->engine, txn);
@@ -448,6 +449,19 @@ txn_complete(struct txn *txn)
 					     txn->signature - n_rows + 1,
 					     stop_tm - txn->start_tm);
 		}
+	} else {
+		/*
+		 * Complete is called on every WAL operation
+		 * authored by this transaction. And it not always
+		 * is one. And not always is enough for commit.
+		 * In case the transaction is waiting for acks, it
+		 * can't be committed right away. Give control
+		 * back to the fiber, owning the transaction so as
+		 * it could decide what to do next.
+		 */
+		if (txn->fiber != NULL && txn->fiber != fiber())
+			fiber_wakeup(txn->fiber);
+		return;
 	}
 	/*
 	 * If there is no fiber waiting for the transaction then
@@ -517,6 +531,11 @@ txn_journal_entry_new(struct txn *txn)
 
 		req->approx_len += xrow_approx_len(stmt->row);
 	}
+	/*
+	 * There is no a check for all-local rows, because a local
+	 * space can't be synchronous. So if there is at least one
+	 * synchronous space, the transaction is not local.
+	 */
 	if (is_sync)
 		txn_set_flag(txn, TXN_WAIT_ACK);
 
@@ -627,6 +646,7 @@ int
 txn_commit(struct txn *txn)
 {
 	struct journal_entry *req;
+	struct txn_limbo_entry *limbo_entry;
 
 	txn->fiber = fiber();
 
@@ -648,8 +668,31 @@ txn_commit(struct txn *txn)
 		return -1;
 	}
 
+	bool is_sync = txn_has_flag(txn, TXN_WAIT_ACK);
+	if (is_sync) {
+		/*
+		 * Remote rows, if any, come before local rows, so
+		 * check for originating instance id here.
+		 */
+		uint32_t origin_id = req->rows[0]->replica_id;
+
+		/*
+		 * Append now. Before even WAL write is done.
+		 * After WAL write nothing should fail, even OOM
+		 * wouldn't be acceptable.
+		 */
+		limbo_entry = txn_limbo_append(&txn_limbo, origin_id, txn);
+		if (limbo_entry == NULL) {
+			txn_rollback(txn);
+			txn_free(txn);
+			return -1;
+		}
+	}
+
 	fiber_set_txn(fiber(), NULL);
 	if (journal_write(req) != 0) {
+		if (is_sync)
+			txn_limbo_abort(&txn_limbo, limbo_entry);
 		fiber_set_txn(fiber(), txn);
 		txn_rollback(txn);
 		txn_free(txn);
@@ -658,7 +701,11 @@ txn_commit(struct txn *txn)
 		diag_log();
 		return -1;
 	}
-
+	if (is_sync) {
+		txn_limbo_assign_lsn(&txn_limbo, limbo_entry,
+				     req->rows[req->n_rows - 1]->lsn);
+		txn_limbo_wait_complete(&txn_limbo, limbo_entry);
+	}
 	if (!txn_has_flag(txn, TXN_IS_DONE)) {
 		txn->signature = req->res;
 		txn_complete(txn);
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
new file mode 100644
index 000000000..9de91db93
--- /dev/null
+++ b/src/box/txn_limbo.c
@@ -0,0 +1,176 @@
+/*
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ *    copyright notice, this list of conditions and the
+ *    following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ *    copyright notice, this list of conditions and the following
+ *    disclaimer in the documentation and/or other materials
+ *    provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#include "txn.h"
+#include "txn_limbo.h"
+#include "replication.h"
+
+struct txn_limbo txn_limbo;
+
+static inline void
+txn_limbo_create(struct txn_limbo *limbo)
+{
+	rlist_create(&limbo->queue);
+	limbo->instance_id = REPLICA_ID_NIL;
+	vclock_create(&limbo->vclock);
+}
+
+struct txn_limbo_entry *
+txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
+{
+	assert(txn_has_flag(txn, TXN_WAIT_ACK));
+	if (id == 0)
+		id = instance_id;
+	if (limbo->instance_id != id) {
+		if (limbo->instance_id == REPLICA_ID_NIL ||
+		    rlist_empty(&limbo->queue)) {
+			limbo->instance_id = id;
+		} else {
+			diag_set(ClientError, ER_UNCOMMITTED_FOREIGN_SYNC_TXNS,
+				 limbo->instance_id);
+			return NULL;
+		}
+	}
+	size_t size;
+	struct txn_limbo_entry *e = region_alloc_object(&txn->region,
+							typeof(*e), &size);
+	if (e == NULL) {
+		diag_set(OutOfMemory, size, "region_alloc_object", "e");
+		return NULL;
+	}
+	e->txn = txn;
+	e->lsn = -1;
+	e->ack_count = 0;
+	e->is_commit = false;
+	e->is_rollback = false;
+	rlist_add_tail_entry(&limbo->queue, e, in_queue);
+	return e;
+}
+
+static inline void
+txn_limbo_remove(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
+{
+	assert(!rlist_empty(&entry->in_queue));
+	assert(rlist_first_entry(&limbo->queue, struct txn_limbo_entry,
+				 in_queue) == entry);
+	(void) limbo;
+	rlist_del_entry(entry, in_queue);
+}
+
+void
+txn_limbo_abort(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
+{
+	entry->is_rollback = true;
+	txn_limbo_remove(limbo, entry);
+}
+
+void
+txn_limbo_assign_lsn(struct txn_limbo *limbo, struct txn_limbo_entry *entry,
+		     int64_t lsn)
+{
+	assert(limbo->instance_id != REPLICA_ID_NIL);
+	entry->lsn = lsn;
+	++entry->ack_count;
+	vclock_follow(&limbo->vclock, limbo->instance_id, lsn);
+}
+
+static bool
+txn_limbo_check_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
+{
+	if (txn_limbo_entry_is_complete(entry))
+		return true;
+	struct vclock_iterator iter;
+	vclock_iterator_init(&iter, &limbo->vclock);
+	int ack_count = 0;
+	int64_t lsn = entry->lsn;
+	vclock_foreach(&iter, vc)
+		ack_count += vc.lsn >= lsn;
+	assert(ack_count >= entry->ack_count);
+	entry->ack_count = ack_count;
+	entry->is_commit = ack_count >= replication_synchro_quorum;
+	return entry->is_commit;
+}
+
+void
+txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
+{
+	struct txn *txn = entry->txn;
+	assert(entry->lsn > 0);
+	assert(!txn_has_flag(txn, TXN_IS_DONE));
+	assert(txn_has_flag(txn, TXN_WAIT_ACK));
+	if (txn_limbo_check_complete(limbo, entry)) {
+		txn_limbo_remove(limbo, entry);
+		return;
+	}
+	bool cancellable = fiber_set_cancellable(false);
+	while (!txn_limbo_entry_is_complete(entry))
+		fiber_yield();
+	fiber_set_cancellable(cancellable);
+	// TODO: implement rollback.
+	// TODO: implement confirm.
+	assert(!entry->is_rollback);
+	txn_limbo_remove(limbo, entry);
+	txn_clear_flag(txn, TXN_WAIT_ACK);
+}
+
+void
+txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
+{
+	if (rlist_empty(&limbo->queue))
+		return;
+	assert(limbo->instance_id != REPLICA_ID_NIL);
+	int64_t prev_lsn = vclock_get(&limbo->vclock, replica_id);
+	vclock_follow(&limbo->vclock, replica_id, lsn);
+	struct txn_limbo_entry *e, *tmp;
+	rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
+		if (e->lsn <= prev_lsn)
+			continue;
+		if (e->lsn > lsn)
+			break;
+		if (++e->ack_count >= replication_synchro_quorum) {
+			// TODO: better call complete() right
+			// here. Appliers use async transactions,
+			// and their txns don't have fibers to
+			// wake up. That becomes actual, when
+			// appliers will be supposed to wait for
+			// 'confirm' message.
+			e->is_commit = true;
+			rlist_del_entry(e, in_queue);
+			fiber_wakeup(e->txn->fiber);
+		}
+		assert(e->ack_count <= VCLOCK_MAX);
+	}
+}
+
+void
+txn_limbo_init(void)
+{
+	txn_limbo_create(&txn_limbo);
+}
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
new file mode 100644
index 000000000..1ad1c567a
--- /dev/null
+++ b/src/box/txn_limbo.h
@@ -0,0 +1,168 @@
+#pragma once
+/*
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ *    copyright notice, this list of conditions and the
+ *    following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ *    copyright notice, this list of conditions and the following
+ *    disclaimer in the documentation and/or other materials
+ *    provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#include "small/rlist.h"
+#include "vclock.h"
+
+#include <stdint.h>
+
+#if defined(__cplusplus)
+extern "C" {
+#endif /* defined(__cplusplus) */
+
+struct txn;
+
+/**
+ * Transaction and its quorum metadata, to be stored in limbo.
+ */
+struct txn_limbo_entry {
+	/** Link for limbo's queue. */
+	struct rlist in_queue;
+	/** Transaction, waiting for a quorum. */
+	struct txn *txn;
+	/**
+	 * LSN of the transaction by the originator's vclock
+	 * component. May be -1 in case the transaction is not
+	 * written to WAL yet.
+	 */
+	int64_t lsn;
+	/**
+	 * Number of ACKs. Or in other words - how many replicas
+	 * confirmed receipt of the transaction.
+	 */
+	int ack_count;
+	/**
+	 * Result flags. Only one of them can be true. But both
+	 * can be false if the transaction is still waiting for
+	 * its resolution.
+	 */
+	bool is_commit;
+	bool is_rollback;
+};
+
+static inline bool
+txn_limbo_entry_is_complete(const struct txn_limbo_entry *e)
+{
+	return e->is_commit || e->is_rollback;
+}
+
+/**
+ * Limbo is a place where transactions are stored, which are
+ * finished, but not committed nor rolled back. These are
+ * synchronous transactions in progress of collecting ACKs from
+ * replicas.
+ * Limbo's main purposes
+ *   - maintain the transactions ordered by LSN of their emitter;
+ *   - be a link between transaction and replication modules, so
+ *     as they wouldn't depend on each other directly.
+ */
+struct txn_limbo {
+	/**
+	 * Queue of limbo entries. Ordered by LSN. Some of the
+	 * entries in the end may not have an LSN yet (their local
+	 * WAL write is still in progress), but their order won't
+	 * change anyway. Because WAL write completions will give
+	 * them LSNs in the same order.
+	 */
+	struct rlist queue;
+	/**
+	 * Instance ID of the owner of all the transactions in the
+	 * queue. Strictly speaking, nothing prevents to store not
+	 * own transactions here, originated from some other
+	 * instance. But still the queue may contain only
+	 * transactions of the same instance. Otherwise LSN order
+	 * won't make sense - different nodes have own independent
+	 * LSNs in their vclock components.
+	 */
+	uint32_t instance_id;
+	/**
+	 * All components of the vclock are versions of the limbo
+	 * owner's LSN, how it is visible on other nodes. For
+	 * example, assume instance ID of the limbo is 1. Then
+	 * vclock[1] here is local LSN of the instance 1.
+	 * vclock[2] is how replica with ID 2 sees LSN of
+	 * instance 1.
+	 * vclock[3] is how replica with ID 3 sees LSN of
+	 * instance 1, and so on.
+	 * In that way by looking at this vclock it is always can
+	 * be said up to which LSN there is a sync quorum for
+	 * transactions, created on the limbo's owner node.
+	 */
+	struct vclock vclock;
+};
+
+/**
+ * Global limbo entry. So far an instance can have only one limbo,
+ * where master's transactions are stored. Eventually there may
+ * appear more than one limbo for master-master support.
+ */
+extern struct txn_limbo txn_limbo;
+
+/**
+ * Allocate, create, and append a new transaction to the limbo.
+ * The limbo entry is allocated on the transaction's region.
+ */
+struct txn_limbo_entry *
+txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn);
+
+/** Remove the entry from the limbo, mark as rolled back. */
+void
+txn_limbo_abort(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
+
+/**
+ * Assign local LSN to the limbo entry. That happens when the
+ * transaction is added to the limbo, writes to WAL, and gets an
+ * LSN.
+ */
+void
+txn_limbo_assign_lsn(struct txn_limbo *limbo, struct txn_limbo_entry *entry,
+		     int64_t lsn);
+
+/**
+ * Ack all transactions up to the given LSN on behalf of the
+ * replica with the specified ID.
+ */
+void
+txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn);
+
+/**
+ * Block the current fiber until the transaction in the limbo
+ * entry is either committed or rolled back.
+ */
+void
+txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
+
+void
+txn_limbo_init();
+
+#if defined(__cplusplus)
+}
+#endif /* defined(__cplusplus) */
diff --git a/test/box/error.result b/test/box/error.result
index 2196fa541..69c471085 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -432,6 +432,7 @@ t;
  |   211: box.error.WRONG_QUERY_ID
  |   212: box.error.SEQUENCE_NOT_STARTED
  |   213: box.error.NO_SUCH_SESSION_SETTING
+ |   214: box.error.UNCOMMITTED_FOREIGN_SYNC_TXNS
  | ...
 
 test_run:cmd("setopt delimiter ''");
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 05/19] xrow: introduce CONFIRM and ROLLBACK entries
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (13 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 04/19] replication: make sync transactions wait quorum Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 06/19] txn: introduce various reasons for txn rollback Vladislav Shpilevoy
                     ` (12 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Serge Petrenko <sergepetrenko@tarantool.org>

Add methods to encode/decode CONFIRM entry.
A CONFIRM entry will be written to WAL by synchronous replication master
as soon as it finds that the transaction was applied on a quorum of
replicas.
CONFIRM rows share the same header with other rows in WAL, but their body
differs: it's just a map containing replica_id and lsn of the last
confirmed transaction.

ROLLBACK request contains the same data as CONFIRM request.
The only difference is the request semantics. While a CONFIRM request
releases all the limbo entries up to the given lsn, the ROLLBACK request
rolls back all the entries with lsn greater than given one.

Part-of #4847
Part-of #4848

@TarantoolBot document
Title: document synchronous replication auxiliary requests

Two new iproto request codes are added:
 * IPROTO_CONFIRM  = 0x28 (decimal 40)
 * IPROTO_ROLLBACK = 0x29 (decimal 41)

Both entries share the same request body (it's a map of 2 items):
IPROTO_REPLICA_ID : leader_id - id of the synchronous replication leader,
IPROTO_LSN : leader_lsn - lsn of the last confirmed transaction.

The CONFIRM and ROLLBACK ops are written to WAL, so their header also has
IPROTO_REPLICA_ID and IPROTO_LSN fields, which are replica_id : lsn of the
instance that wrote these records. leader_id may be different from
replica_id, and leader_lsn refers to some past moment in time.

When an instance either reads from WAL or receives a CONFIRM entry via
replication, it knows that all the leader's synchronous transactions
up to the given leader_lsn may be safely committed.

When an instance receives or reads a ROLLBACK entry, it knows that all
the leader's transactions received up to the given point in time must be
rolled back, starting with a transaction, which begins with leader_lsn + 1.
---
 src/box/iproto_constants.h |  12 +++++
 src/box/xrow.c             | 106 +++++++++++++++++++++++++++++++++++++
 src/box/xrow.h             |  46 ++++++++++++++++
 3 files changed, 164 insertions(+)

diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
index e38ee4529..6b850f101 100644
--- a/src/box/iproto_constants.h
+++ b/src/box/iproto_constants.h
@@ -219,6 +219,11 @@ enum iproto_type {
 	/** The maximum typecode used for box.stat() */
 	IPROTO_TYPE_STAT_MAX,
 
+	/** A confirmation message for synchronous transactions. */
+	IPROTO_CONFIRM = 40,
+	/** A rollback message for synchronous transactions. */
+	IPROTO_ROLLBACK = 41,
+
 	/** PING request */
 	IPROTO_PING = 64,
 	/** Replication JOIN command */
@@ -316,6 +321,13 @@ dml_request_key_map(uint32_t type)
 	return iproto_body_key_map[type];
 }
 
+/** CONFIRM/ROLLBACK entries for synchronous replication. */
+static inline bool
+iproto_type_is_synchro_request(uint32_t type)
+{
+	return type == IPROTO_CONFIRM || type == IPROTO_ROLLBACK;
+}
+
 /** This is an error. */
 static inline bool
 iproto_type_is_error(uint32_t type)
diff --git a/src/box/xrow.c b/src/box/xrow.c
index bb64864b2..39d1814c4 100644
--- a/src/box/xrow.c
+++ b/src/box/xrow.c
@@ -878,6 +878,112 @@ xrow_encode_dml(const struct request *request, struct region *region,
 	return iovcnt;
 }
 
+static int
+xrow_encode_confirm_rollback(struct xrow_header *row, uint32_t replica_id,
+			     int64_t lsn, int type)
+{
+	size_t len = mp_sizeof_map(2) + mp_sizeof_uint(IPROTO_REPLICA_ID) +
+		     mp_sizeof_uint(replica_id) + mp_sizeof_uint(IPROTO_LSN) +
+		     mp_sizeof_uint(lsn);
+	char *buf = (char *)region_alloc(&fiber()->gc, len);
+	if (buf == NULL) {
+		diag_set(OutOfMemory, len, "region_alloc", "buf");
+		return -1;
+	}
+	char *pos = buf;
+
+	pos = mp_encode_map(pos, 2);
+	pos = mp_encode_uint(pos, IPROTO_REPLICA_ID);
+	pos = mp_encode_uint(pos, replica_id);
+	pos = mp_encode_uint(pos, IPROTO_LSN);
+	pos = mp_encode_uint(pos, lsn);
+
+	memset(row, 0, sizeof(*row));
+
+	row->body[0].iov_base = buf;
+	row->body[0].iov_len = len;
+	row->bodycnt = 1;
+
+	row->type = type;
+
+	return 0;
+}
+
+int
+xrow_encode_confirm(struct xrow_header *row, uint32_t replica_id, int64_t lsn)
+{
+	return xrow_encode_confirm_rollback(row, replica_id, lsn,
+					    IPROTO_CONFIRM);
+}
+
+int
+xrow_encode_rollback(struct xrow_header *row, uint32_t replica_id, int64_t lsn)
+{
+	return xrow_encode_confirm_rollback(row, replica_id, lsn,
+					    IPROTO_ROLLBACK);
+}
+
+static int
+xrow_decode_confirm_rollback(struct xrow_header *row, uint32_t *replica_id,
+			     int64_t *lsn)
+{
+	if (row->bodycnt == 0) {
+		diag_set(ClientError, ER_INVALID_MSGPACK, "request body");
+		return -1;
+	}
+
+	assert(row->bodycnt == 1);
+
+	const char * const data = (const char *)row->body[0].iov_base;
+	const char * const end = data + row->body[0].iov_len;
+	const char *d = data;
+	if (mp_check(&d, end) != 0 || mp_typeof(*data) != MP_MAP) {
+		xrow_on_decode_err(data, end, ER_INVALID_MSGPACK,
+				   "request body");
+		return -1;
+	}
+
+	d = data;
+	uint32_t map_size = mp_decode_map(&d);
+	for (uint32_t i = 0; i < map_size; i++) {
+		enum mp_type type = mp_typeof(*d);
+		if (type != MP_UINT) {
+			mp_next(&d);
+			mp_next(&d);
+			continue;
+		}
+		uint8_t key = mp_decode_uint(&d);
+		if (key >= IPROTO_KEY_MAX || iproto_key_type[key] != type) {
+			xrow_on_decode_err(data, end, ER_INVALID_MSGPACK,
+					   "request body");
+			return -1;
+		}
+		switch (key) {
+		case IPROTO_REPLICA_ID:
+			*replica_id = mp_decode_uint(&d);
+			break;
+		case IPROTO_LSN:
+			*lsn = mp_decode_uint(&d);
+			break;
+		default:
+			mp_next(&d);
+		}
+	}
+	return 0;
+}
+
+int
+xrow_decode_confirm(struct xrow_header *row, uint32_t *replica_id, int64_t *lsn)
+{
+	return xrow_decode_confirm_rollback(row, replica_id, lsn);
+}
+
+int
+xrow_decode_rollback(struct xrow_header *row, uint32_t *replica_id, int64_t *lsn)
+{
+	return xrow_decode_confirm_rollback(row, replica_id, lsn);
+}
+
 int
 xrow_to_iovec(const struct xrow_header *row, struct iovec *out)
 {
diff --git a/src/box/xrow.h b/src/box/xrow.h
index 2a0a9c852..1def394e7 100644
--- a/src/box/xrow.h
+++ b/src/box/xrow.h
@@ -207,6 +207,52 @@ int
 xrow_encode_dml(const struct request *request, struct region *region,
 		struct iovec *iov);
 
+/**
+ * Encode the CONFIRM to row body and set row type to
+ * IPROTO_CONFIRM.
+ * @param row xrow header.
+ * @param replica_id master's instance id.
+ * @param lsn last confirmed lsn.
+ * @retval -1 on error.
+ * @retval 0 success.
+ */
+int
+xrow_encode_confirm(struct xrow_header *row, uint32_t replica_id, int64_t lsn);
+
+/**
+ * Decode the CONFIRM request body.
+ * @param row xrow header.
+ * @param[out] replica_id master's instance id.
+ * @param[out] lsn last confirmed lsn.
+ * @retval -1 on error.
+ * @retval 0 success.
+ */
+int
+xrow_decode_confirm(struct xrow_header *row, uint32_t *replica_id, int64_t *lsn);
+
+/**
+ * Encode the ROLLBACK row body and set row type to
+ * IPROTO_ROLLBACK.
+ * @param row xrow header.
+ * @param replica_id master's instance id.
+ * @param lsn lsn to rollback to.
+ * @retval -1  on error.
+ * @retval 0 success.
+ */
+int
+xrow_encode_rollback(struct xrow_header *row, uint32_t replica_id, int64_t lsn);
+
+/**
+ * Decode the ROLLBACK row body.
+ * @param row xrow header.
+ * @param[out] replica_id master's instance id.
+ * @param[out] lsn lsn to rollback to.
+ * @retval -1 on error.
+ * @retval 0 success.
+ */
+int
+xrow_decode_rollback(struct xrow_header *row, uint32_t *replica_id, int64_t *lsn);
+
 /**
  * CALL/EVAL request.
  */
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 06/19] txn: introduce various reasons for txn rollback
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (14 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 05/19] xrow: introduce CONFIRM and ROLLBACK entries Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 07/19] replication: write and read CONFIRM entries Vladislav Shpilevoy
                     ` (11 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Serge Petrenko <sergepetrenko@tarantool.org>

Transaction on_rollback triggers will need to distinguish
txn_limbo-issued rollbacks from rollbacks that happened due to a failed
WAL write or memory error.

Prerequisite #4847
Prerequisite #4848
---
 src/box/txn.c |  6 +++---
 src/box/txn.h | 23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/src/box/txn.c b/src/box/txn.c
index 6cfa98212..9de72461b 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -222,7 +222,7 @@ txn_begin(void)
 	txn->flags = 0;
 	txn->in_sub_stmt = 0;
 	txn->id = ++tsn;
-	txn->signature = -1;
+	txn->signature = TXN_SIGNATURE_ROLLBACK;
 	txn->engine = NULL;
 	txn->engine_tx = NULL;
 	txn->fk_deferred_count = 0;
@@ -589,7 +589,7 @@ static bool
 txn_commit_nop(struct txn *txn)
 {
 	if (txn->n_new_rows + txn->n_applier_rows == 0) {
-		txn->signature = 0;
+		txn->signature = TXN_SIGNATURE_NOP;
 		txn_complete(txn);
 		fiber_set_txn(fiber(), NULL);
 		return true;
@@ -738,7 +738,7 @@ txn_rollback(struct txn *txn)
 	trigger_clear(&txn->fiber_on_stop);
 	if (!txn_has_flag(txn, TXN_CAN_YIELD))
 		trigger_clear(&txn->fiber_on_yield);
-	txn->signature = -1;
+	txn->signature = TXN_SIGNATURE_ROLLBACK;
 	txn_complete(txn);
 	fiber_set_txn(fiber(), NULL);
 }
diff --git a/src/box/txn.h b/src/box/txn.h
index 232cc07a8..8ec4a248c 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -83,6 +83,29 @@ enum {
 	TXN_SUB_STMT_MAX = 3
 };
 
+enum {
+	/** Signature set for empty transactions. */
+	TXN_SIGNATURE_NOP = 0,
+	/**
+	 * The default signature value for failed transactions.
+	 * Indicates either write failure or any other failure
+	 * not caused by synchronous transaction processing.
+	 */
+	TXN_SIGNATURE_ROLLBACK = -1,
+	/**
+	 * A value set for failed synchronous transactions
+	 * on master, when not enough acks were collected.
+	 */
+	TXN_SIGNATURE_QUORUM_TIMEOUT = -2,
+	/**
+	 * A value set for failed synchronous transactions
+	 * on replica (or any instance during recovery), when a
+	 * transaction is rolled back because ROLLBACK message was
+	 * read.
+	 */
+	TXN_SIGNATURE_SYNC_ROLLBACK = -3,
+};
+
 /**
  * A single statement of a multi-statement
  * transaction: undo and redo info.
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 07/19] replication: write and read CONFIRM entries
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (15 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 06/19] txn: introduce various reasons for txn rollback Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 08/19] replication: add support of qsync to the snapshot machinery Vladislav Shpilevoy
                     ` (10 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Serge Petrenko <sergepetrenko@tarantool.org>

Make txn_limbo write a CONFIRM entry as soon as a batch of entries
receive their acks. CONFIRM entry is written to WAL and later replicated
to all the replicas.

Now replicas put synchronous transactions into txn_limbo and wait for
corresponding confirmation entries to arrive and end up in their WAL
before committing the transactions.

Closes #4847
---
 src/box/applier.cc    | 54 ++++++++++++++++++++++++++++
 src/box/box.cc        |  3 ++
 src/box/errcode.h     |  1 +
 src/box/relay.cc      |  6 ++--
 src/box/txn.c         | 77 ++++++++++++++++++++++++++++++++++-----
 src/box/txn.h         | 13 +++++++
 src/box/txn_limbo.c   | 84 +++++++++++++++++++++++++++++++++++++------
 src/box/txn_limbo.h   |  6 ++++
 test/box/error.result |  1 +
 9 files changed, 224 insertions(+), 21 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index df48b4796..1b9ea2f71 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -51,6 +51,7 @@
 #include "txn.h"
 #include "box.h"
 #include "scoped_guard.h"
+#include "txn_limbo.h"
 
 STRS(applier_state, applier_STATE);
 
@@ -214,6 +215,11 @@ apply_snapshot_row(struct xrow_header *row)
 	struct txn *txn = txn_begin();
 	if (txn == NULL)
 		return -1;
+	/*
+	 * Do not wait for confirmation when fetching a snapshot.
+	 * Master only sends confirmed rows during join.
+	 */
+	txn_set_flag(txn, TXN_FORCE_ASYNC);
 	if (txn_begin_stmt(txn, space) != 0)
 		goto rollback;
 	/* no access checks here - applier always works with admin privs */
@@ -249,10 +255,47 @@ process_nop(struct request *request)
 	return txn_commit_stmt(txn, request);
 }
 
+/*
+ * CONFIRM rows aren't dml requests  and require special
+ * handling: instead of performing some operations on spaces,
+ * processing these requests required txn_limbo to confirm some
+ * of its entries.
+ */
+static int
+process_confirm(struct request *request)
+{
+	assert(request->header->type == IPROTO_CONFIRM);
+	uint32_t replica_id;
+	struct txn *txn = in_txn();
+	int64_t lsn = 0;
+	if (xrow_decode_confirm(request->header, &replica_id, &lsn) != 0)
+		return -1;
+
+	if (replica_id != txn_limbo.instance_id) {
+		diag_set(ClientError, ER_SYNC_MASTER_MISMATCH, replica_id,
+			 txn_limbo.instance_id);
+		return -1;
+	}
+
+	if (txn_begin_stmt(txn, NULL) != 0)
+		return -1;
+
+	if (txn_commit_stmt(txn, request) == 0) {
+		txn_limbo_read_confirm(&txn_limbo, lsn);
+		return 0;
+	} else {
+		return -1;
+	}
+}
+
 static int
 apply_row(struct xrow_header *row)
 {
 	struct request request;
+	if (row->type == IPROTO_CONFIRM) {
+		request.header = row;
+		return process_confirm(&request);
+	}
 	if (xrow_decode_dml(row, &request, dml_request_key_map(row->type)) != 0)
 		return -1;
 	if (request.type == IPROTO_NOP)
@@ -270,9 +313,20 @@ apply_row(struct xrow_header *row)
 static int
 apply_final_join_row(struct xrow_header *row)
 {
+	/*
+	 * Confirms are ignored during join. All the data master
+	 * sends us is valid.
+	 */
+	if (row->type == IPROTO_CONFIRM)
+		return 0;
 	struct txn *txn = txn_begin();
 	if (txn == NULL)
 		return -1;
+	/*
+	 * Do not wait for confirmation while processing final
+	 * join rows. See apply_snapshot_row().
+	 */
+	txn_set_flag(txn, TXN_FORCE_ASYNC);
 	if (apply_row(row) != 0) {
 		txn_rollback(txn);
 		fiber_gc();
diff --git a/src/box/box.cc b/src/box/box.cc
index 02088ba01..ba7347367 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -342,6 +342,9 @@ static void
 apply_wal_row(struct xstream *stream, struct xrow_header *row)
 {
 	struct request request;
+	// TODO: process confirmation during recovery.
+	if (row->type == IPROTO_CONFIRM)
+		return;
 	xrow_decode_dml_xc(row, &request, dml_request_key_map(row->type));
 	if (request.type != IPROTO_NOP) {
 		struct space *space = space_cache_find_xc(request.space_id);
diff --git a/src/box/errcode.h b/src/box/errcode.h
index 019c582af..3ba6866e5 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -267,6 +267,7 @@ struct errcode_record {
 	/*212 */_(ER_SEQUENCE_NOT_STARTED,		"Sequence '%s' is not started") \
 	/*213 */_(ER_NO_SUCH_SESSION_SETTING,	"Session setting %s doesn't exist") \
 	/*214 */_(ER_UNCOMMITTED_FOREIGN_SYNC_TXNS, "Found uncommitted sync transactions from other instance with id %u") \
+	/*215 */_(ER_SYNC_MASTER_MISMATCH,	"CONFIRM message arrived for an unknown master id %d, expected %d") \
 
 /*
  * !IMPORTANT! Please follow instructions at start of the file
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 36fc14b8c..0adc9fc98 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -404,7 +404,8 @@ tx_status_update(struct cmsg *msg)
 	 * Let pending synchronous transactions know, which of
 	 * them were successfully sent to the replica. Acks are
 	 * collected only by the transactions originator (which is
-	 * the single master in 100% so far).
+	 * the single master in 100% so far). Other instances wait
+	 * for master's CONFIRM message instead.
 	 */
 	if (txn_limbo.instance_id == instance_id) {
 		txn_limbo_ack(&txn_limbo, status->relay->replica->id,
@@ -770,7 +771,8 @@ static void
 relay_send_row(struct xstream *stream, struct xrow_header *packet)
 {
 	struct relay *relay = container_of(stream, struct relay, stream);
-	assert(iproto_type_is_dml(packet->type));
+	assert(iproto_type_is_dml(packet->type) ||
+	       packet->type == IPROTO_CONFIRM);
 	if (packet->group_id == GROUP_LOCAL) {
 		/*
 		 * We do not relay replica-local rows to other
diff --git a/src/box/txn.c b/src/box/txn.c
index 9de72461b..612cd19bc 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -36,6 +36,7 @@
 #include <fiber.h>
 #include "xrow.h"
 #include "errinj.h"
+#include "iproto_constants.h"
 
 double too_long_threshold;
 
@@ -81,7 +82,12 @@ txn_add_redo(struct txn *txn, struct txn_stmt *stmt, struct request *request)
 	 */
 	struct space *space = stmt->space;
 	row->group_id = space != NULL ? space_group_id(space) : 0;
-	row->bodycnt = xrow_encode_dml(request, &txn->region, row->body);
+	/*
+	 * IPROTO_CONFIRM entries are supplementary and aren't
+	 * valid dml requests. They're encoded manually.
+	 */
+	if (likely(row->type != IPROTO_CONFIRM))
+		row->bodycnt = xrow_encode_dml(request, &txn->region, row->body);
 	if (row->bodycnt < 0)
 		return -1;
 	stmt->row = row;
@@ -321,8 +327,10 @@ txn_commit_stmt(struct txn *txn, struct request *request)
 	 */
 	struct txn_stmt *stmt = txn_current_stmt(txn);
 
-	/* Create WAL record for the write requests in non-temporary spaces.
-	 * stmt->space can be NULL for IRPOTO_NOP.
+	/*
+	 * Create WAL record for the write requests in
+	 * non-temporary spaces. stmt->space can be NULL for
+	 * IRPOTO_NOP or IPROTO_CONFIRM.
 	 */
 	if (stmt->space == NULL || !space_is_temporary(stmt->space)) {
 		if (txn_add_redo(txn, stmt, request) != 0)
@@ -417,12 +425,12 @@ txn_run_rollback_triggers(struct txn *txn, struct rlist *triggers)
 /**
  * Complete transaction processing.
  */
-static void
+void
 txn_complete(struct txn *txn)
 {
 	/*
 	 * Note, engine can be NULL if transaction contains
-	 * IPROTO_NOP statements only.
+	 * IPROTO_NOP or IPROTO_CONFIRM statements.
 	 */
 	if (txn->signature < 0) {
 		/* Undo the transaction. */
@@ -536,7 +544,7 @@ txn_journal_entry_new(struct txn *txn)
 	 * space can't be synchronous. So if there is at least one
 	 * synchronous space, the transaction is not local.
 	 */
-	if (is_sync)
+	if (is_sync && !txn_has_flag(txn, TXN_FORCE_ASYNC))
 		txn_set_flag(txn, TXN_WAIT_ACK);
 
 	assert(remote_row == req->rows + txn->n_applier_rows);
@@ -598,6 +606,23 @@ txn_commit_nop(struct txn *txn)
 	return false;
 }
 
+/*
+ * A trigger called on tx rollback due to a failed WAL write,
+ * when tx is waiting for confirmation.
+ */
+static int
+txn_limbo_on_rollback(struct trigger *trig, void *event)
+{
+	(void) event;
+	struct txn *txn = (struct txn *) event;
+	/* Check whether limbo has performed the cleanup. */
+	if (txn->signature != TXN_SIGNATURE_ROLLBACK)
+		return 0;
+	struct txn_limbo_entry *entry = (struct txn_limbo_entry *) trig->data;
+	txn_limbo_abort(&txn_limbo, entry);
+	return 0;
+}
+
 int
 txn_commit_async(struct txn *txn)
 {
@@ -629,16 +654,52 @@ txn_commit_async(struct txn *txn)
 		return -1;
 	}
 
+	bool is_sync = txn_has_flag(txn, TXN_WAIT_ACK);
+	struct txn_limbo_entry *limbo_entry;
+	if (is_sync) {
+		/*
+		 * We'll need this trigger for sync transactions later,
+		 * but allocation failure is inappropriate after the entry
+		 * is sent to journal, so allocate early.
+		 */
+		size_t size;
+		struct trigger *trig =
+			region_alloc_object(&txn->region, typeof(*trig), &size);
+		if (trig == NULL) {
+			txn_rollback(txn);
+			diag_set(OutOfMemory, size, "region_alloc_object",
+				 "trig");
+			return -1;
+		}
+
+		/* See txn_commit(). */
+		uint32_t origin_id = req->rows[0]->replica_id;
+		int64_t lsn = req->rows[txn->n_applier_rows - 1]->lsn;
+		limbo_entry = txn_limbo_append(&txn_limbo, origin_id, txn);
+		if (limbo_entry == NULL) {
+			txn_rollback(txn);
+			return -1;
+		}
+		assert(lsn > 0);
+		txn_limbo_assign_lsn(&txn_limbo, limbo_entry, lsn);
+
+		/*
+		 * Set a trigger to abort waiting for confirm on
+		 * WAL write failure.
+		 */
+		trigger_create(trig, txn_limbo_on_rollback,
+			       limbo_entry, NULL);
+		txn_on_rollback(txn, trig);
+	}
+
 	fiber_set_txn(fiber(), NULL);
 	if (journal_write_async(req) != 0) {
 		fiber_set_txn(fiber(), txn);
 		txn_rollback(txn);
-
 		diag_set(ClientError, ER_WAL_IO);
 		diag_log();
 		return -1;
 	}
-
 	return 0;
 }
 
diff --git a/src/box/txn.h b/src/box/txn.h
index 8ec4a248c..c631d7033 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -73,6 +73,13 @@ enum txn_flag {
 	 * then finishes commit and returns success to a user.
 	 */
 	TXN_WAIT_ACK,
+	/**
+	 * A transaction mustn't wait for confirmation, even if it
+	 * touches synchronous spaces. Needed for join stage on
+	 * replica, when all the data coming from the master is
+	 * already confirmed by design.
+	 */
+	TXN_FORCE_ASYNC,
 };
 
 enum {
@@ -301,6 +308,12 @@ fiber_set_txn(struct fiber *fiber, struct txn *txn)
 struct txn *
 txn_begin(void);
 
+/**
+ * Complete transaction processing.
+ */
+void
+txn_complete(struct txn *txn);
+
 /**
  * Commit a transaction.
  * @pre txn == in_txn()
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 9de91db93..b38d82e4f 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -134,12 +134,65 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 		fiber_yield();
 	fiber_set_cancellable(cancellable);
 	// TODO: implement rollback.
-	// TODO: implement confirm.
 	assert(!entry->is_rollback);
+	assert(entry->is_commit);
 	txn_limbo_remove(limbo, entry);
 	txn_clear_flag(txn, TXN_WAIT_ACK);
 }
 
+/**
+ * Write a confirmation entry to WAL. After it's written all the
+ * transactions waiting for confirmation may be finished.
+ */
+static int
+txn_limbo_write_confirm(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
+{
+	struct xrow_header row;
+	struct request request = {
+		.header = &row,
+	};
+
+	if (xrow_encode_confirm(&row, limbo->instance_id, entry->lsn) < 0)
+		return -1;
+
+	struct txn *txn = txn_begin();
+	if (txn == NULL)
+		return -1;
+
+	if (txn_begin_stmt(txn, NULL) != 0)
+		goto rollback;
+	if (txn_commit_stmt(txn, &request) != 0)
+		goto rollback;
+
+	return txn_commit(txn);
+rollback:
+	txn_rollback(txn);
+	return -1;
+}
+
+void
+txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
+{
+	assert(limbo->instance_id != REPLICA_ID_NIL &&
+	       limbo->instance_id != instance_id);
+	struct txn_limbo_entry *e, *tmp;
+	rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
+		if (e->lsn > lsn)
+			break;
+		e->is_commit = true;
+		txn_limbo_remove(limbo, e);
+		txn_clear_flag(e->txn, TXN_WAIT_ACK);
+		/*
+		 * If  txn_complete_async() was already called,
+		 * finish tx processing. Otherwise just clear the
+		 * "WAIT_ACK" flag. Tx procesing will finish once
+		 * the tx is written to WAL.
+		 */
+		if (e->txn->signature >= 0)
+			txn_complete(e->txn);
+	}
+}
+
 void
 txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 {
@@ -148,25 +201,34 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 	assert(limbo->instance_id != REPLICA_ID_NIL);
 	int64_t prev_lsn = vclock_get(&limbo->vclock, replica_id);
 	vclock_follow(&limbo->vclock, replica_id, lsn);
-	struct txn_limbo_entry *e, *tmp;
-	rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
+	struct txn_limbo_entry *e;
+	struct txn_limbo_entry *last_quorum = NULL;
+	rlist_foreach_entry(e, &limbo->queue, in_queue) {
 		if (e->lsn <= prev_lsn)
 			continue;
 		if (e->lsn > lsn)
 			break;
 		if (++e->ack_count >= replication_synchro_quorum) {
-			// TODO: better call complete() right
-			// here. Appliers use async transactions,
-			// and their txns don't have fibers to
-			// wake up. That becomes actual, when
-			// appliers will be supposed to wait for
-			// 'confirm' message.
 			e->is_commit = true;
-			rlist_del_entry(e, in_queue);
-			fiber_wakeup(e->txn->fiber);
+			last_quorum = e;
 		}
 		assert(e->ack_count <= VCLOCK_MAX);
 	}
+	if (last_quorum != NULL) {
+		if (txn_limbo_write_confirm(limbo, last_quorum) != 0) {
+			// TODO: rollback.
+			return;
+		}
+		/*
+		 * Wakeup all the entries in direct order as soon
+		 * as confirmation message is written to WAL.
+		 */
+		rlist_foreach_entry(e, &limbo->queue, in_queue) {
+			fiber_wakeup(e->txn->fiber);
+			if (e == last_quorum)
+				break;
+		}
+	}
 }
 
 void
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 1ad1c567a..de415cd97 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -160,6 +160,12 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn);
 void
 txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 
+/**
+ * Confirm all the entries up to the given master's LSN.
+ */
+void
+txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn);
+
 void
 txn_limbo_init();
 
diff --git a/test/box/error.result b/test/box/error.result
index 69c471085..34ded3930 100644
--- a/test/box/error.result
+++ b/test/box/error.result
@@ -433,6 +433,7 @@ t;
  |   212: box.error.SEQUENCE_NOT_STARTED
  |   213: box.error.NO_SUCH_SESSION_SETTING
  |   214: box.error.UNCOMMITTED_FOREIGN_SYNC_TXNS
+ |   215: box.error.SYNC_MASTER_MISMATCH
  | ...
 
 test_run:cmd("setopt delimiter ''");
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 08/19] replication: add support of qsync to the snapshot machinery
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (16 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 07/19] replication: write and read CONFIRM entries Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-07-02  8:52     ` Serge Petrenko
  2020-07-08 11:43     ` Leonid Vasiliev
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 09/19] txn_limbo: add timeout when waiting for acks Vladislav Shpilevoy
                     ` (9 subsequent siblings)
  27 siblings, 2 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Leonid Vasiliev <lvasiliev@tarantool.org>

To support qsync replication, the waiting for confirmation of
current "sync" transactions during a timeout has been added to
the snapshot machinery. In the case of rollback or the timeout
expiration, the snapshot will be cancelled.

Closes #4850
---
 src/box/gc.c                       |  12 ++
 src/box/txn_limbo.c                |  81 ++++++++++
 src/box/txn_limbo.h                |  29 ++++
 test/unit/CMakeLists.txt           |   3 +
 test/unit/snap_quorum_delay.cc     | 250 +++++++++++++++++++++++++++++
 test/unit/snap_quorum_delay.result |  12 ++
 6 files changed, 387 insertions(+)
 create mode 100644 test/unit/snap_quorum_delay.cc
 create mode 100644 test/unit/snap_quorum_delay.result

diff --git a/src/box/gc.c b/src/box/gc.c
index 8e8ffea75..170c0a97f 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -57,6 +57,9 @@
 #include "engine.h"		/* engine_collect_garbage() */
 #include "wal.h"		/* wal_collect_garbage() */
 #include "checkpoint_schedule.h"
+#include "trigger.h"
+#include "txn.h"
+#include "txn_limbo.h"
 
 struct gc_state gc;
 
@@ -395,6 +398,15 @@ gc_do_checkpoint(bool is_scheduled)
 	rc = wal_begin_checkpoint(&checkpoint);
 	if (rc != 0)
 		goto out;
+
+	/*
+	 * Wait the confirms on all "sync" transactions before
+	 * create a snapshot.
+	 */
+	rc = txn_limbo_wait_confirm(&txn_limbo);
+	if (rc != 0)
+		goto out;
+
 	rc = engine_commit_checkpoint(&checkpoint.vclock);
 	if (rc != 0)
 		goto out;
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index b38d82e4f..bee8e8155 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -231,6 +231,87 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 	}
 }
 
+double
+txn_limbo_confirm_timeout(struct txn_limbo *limbo)
+{
+	(void)limbo;
+	return replication_synchro_timeout;
+}
+
+/**
+ * Waitpoint stores information about the progress of confirmation.
+ * In the case of multimaster support, it will store a bitset
+ * or array instead of the boolean.
+ */
+struct confirm_waitpoint {
+	/**
+	 * Variable for wake up the fiber that is waiting for
+	 * the end of confirmation.
+	 */
+	struct fiber_cond confirm_cond;
+	/**
+	 * Result flag.
+	 */
+	bool is_confirm;
+};
+
+static int
+txn_commit_cb(struct trigger *trigger, void *event)
+{
+	(void)event;
+	struct confirm_waitpoint *cwp =
+		(struct confirm_waitpoint *)trigger->data;
+	cwp->is_confirm = true;
+	fiber_cond_signal(&cwp->confirm_cond);
+	return 0;
+}
+
+static int
+txn_rollback_cb(struct trigger *trigger, void *event)
+{
+	(void)event;
+	struct confirm_waitpoint *cwp =
+		(struct confirm_waitpoint *)trigger->data;
+	fiber_cond_signal(&cwp->confirm_cond);
+	return 0;
+}
+
+int
+txn_limbo_wait_confirm(struct txn_limbo *limbo)
+{
+	if (txn_limbo_is_empty(limbo))
+		return 0;
+
+	/* initialization of a waitpoint. */
+	struct confirm_waitpoint cwp;
+	fiber_cond_create(&cwp.confirm_cond);
+	cwp.is_confirm = false;
+
+	/* Set triggers for the last limbo transaction. */
+	struct trigger on_complete;
+	trigger_create(&on_complete, txn_commit_cb, &cwp, NULL);
+	struct trigger on_rollback;
+	trigger_create(&on_rollback, txn_rollback_cb, &cwp, NULL);
+	struct txn_limbo_entry *tle = txn_limbo_last_entry(limbo);
+	txn_on_commit(tle->txn, &on_complete);
+	txn_on_rollback(tle->txn, &on_rollback);
+
+	int rc = fiber_cond_wait_timeout(&cwp.confirm_cond,
+					 txn_limbo_confirm_timeout(limbo));
+	fiber_cond_destroy(&cwp.confirm_cond);
+	if (rc != 0) {
+		/* Clear the triggers if the timeout has been reached. */
+		trigger_clear(&on_complete);
+		trigger_clear(&on_rollback);
+		return -1;
+	}
+	if (!cwp.is_confirm) {
+		/* The transaction has been rolled back. */
+		return -1;
+	}
+	return 0;
+}
+
 void
 txn_limbo_init(void)
 {
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index de415cd97..94f224131 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -166,6 +166,35 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 void
 txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn);
 
+/**
+ * Return TRUE if limbo is empty.
+ */
+static inline bool
+txn_limbo_is_empty(struct txn_limbo *limbo)
+{
+	return rlist_empty(&limbo->queue);
+}
+
+/**
+ * Return a pointer to the last txn_limbo_entry of limbo.
+ */
+static inline struct txn_limbo_entry *
+txn_limbo_last_entry(struct txn_limbo *limbo)
+{
+	return rlist_last_entry(&limbo->queue, struct txn_limbo_entry,
+				in_queue);
+}
+
+double
+txn_limbo_confirm_timeout(struct txn_limbo *limbo);
+
+/**
+ * Waiting for confirmation of all "sync" transactions
+ * during confirm timeout or fail.
+ */
+int
+txn_limbo_wait_confirm(struct txn_limbo *limbo);
+
 void
 txn_limbo_init();
 
diff --git a/test/unit/CMakeLists.txt b/test/unit/CMakeLists.txt
index 672122118..419477748 100644
--- a/test/unit/CMakeLists.txt
+++ b/test/unit/CMakeLists.txt
@@ -257,6 +257,9 @@ target_link_libraries(swim_errinj.test unit swim)
 add_executable(merger.test merger.test.c)
 target_link_libraries(merger.test unit core box)
 
+add_executable(snap_quorum_delay.test snap_quorum_delay.cc)
+target_link_libraries(snap_quorum_delay.test box core unit)
+
 #
 # Client for popen.test
 add_executable(popen-child popen-child.c)
diff --git a/test/unit/snap_quorum_delay.cc b/test/unit/snap_quorum_delay.cc
new file mode 100644
index 000000000..7a200673a
--- /dev/null
+++ b/test/unit/snap_quorum_delay.cc
@@ -0,0 +1,250 @@
+/*
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ *
+ * Redistribution and use in source and iproto forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ *    copyright notice, this list of conditions and the
+ *    following disclaimer.
+ *
+ * 2. Redistributions in iproto form must reproduce the above
+ *    copyright notice, this list of conditions and the following
+ *    disclaimer in the documentation and/or other materials
+ *    provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#include "unit.h"
+#include "gc.h"
+#include "memory.h"
+#include "txn.h"
+#include "txn_limbo.h"
+
+/**
+ * This test is only about delay in snapshot machinery (needed
+ * for qsync replication). It doesn't test the snapshot
+ * machinery, txn_limbo or something else and uses some tricks
+ * around txn_limbo.
+ * The logic of the test is as folows:
+ * In fiber_1 ("txn_fiber"):
+ *	- start a transaction.
+ *	- push the transaction to the limbo.
+ *	- start wait confirm (yield).
+ * In fiber_2 ("main"):
+ *	- do a snapshot.
+ *	- start wait while the last transaction
+ *	  from the limbo will be completed.
+ * In fiber_3 ("confirm_fiber"):
+ *	- confirm the transaction (remove the transaction from
+ *				   the limbo and wakeup fiber_1).
+ * In fiber_1 ("txn_fiber"):
+ *	- confirm/rollback/hung the transaction.
+ * In fiber_2 ("main"):
+ *	- check_results
+ */
+
+extern int replication_synchro_quorum;
+extern double replication_synchro_timeout;
+
+namespace /* local symbols */ {
+
+int test_result;
+
+/**
+ * Variations of a transaction completion.
+ */
+enum process_type {
+	TXN_PROCESS_COMMIT,
+	TXN_PROCESS_ROLLBACK,
+	TXN_PROCESS_TIMEOUT
+};
+
+/**
+ * Some fake values needed for work with the limbo
+ * (to push a transaction to the limbo and simulate confirm).
+ */
+const int fake_lsn = 1;
+const int instace_id = 1;
+const int relay_id = 2;
+
+int
+trg_cb(struct trigger *trigger, void *event)
+{
+	(void)event;
+	bool *check_trg = (bool *)trigger->data;
+	*check_trg = true;
+	return 0;
+}
+
+int
+txn_process_func(va_list ap)
+{
+	bool *check_trg = va_arg(ap, bool *);
+	enum process_type process_type = (enum process_type)va_arg(ap, int);
+	struct txn *txn = txn_begin();
+	txn->fiber = fiber();
+	/* Set the TXN_WAIT_ACK flag to simulate a sync transaction.*/
+	txn_set_flag(txn, TXN_WAIT_ACK);
+	/*
+	 * The true way to push the transaction to limbo is to call
+	 * txn_commit() for sync transaction. But, if txn_commit()
+	 * will be called now, the transaction will not be pushed to
+	 * the limbo because this is the case txn_commit_nop().
+	 * Instead, we push the transaction to the limbo manually
+	 * and call txn_commit (or another) later.
+	 */
+	struct txn_limbo_entry *entry = txn_limbo_append(&txn_limbo,
+							 instace_id, txn);
+	/*
+	 * The trigger is used to verify that the transaction has been
+	 * completed.
+	 */
+	struct trigger trg;
+	trigger_create(&trg, trg_cb, check_trg, NULL);
+
+	switch (process_type) {
+	case TXN_PROCESS_COMMIT:
+		txn_on_commit(txn, &trg);
+		break;
+	case TXN_PROCESS_ROLLBACK:
+		txn_on_rollback(txn, &trg);
+		break;
+	case TXN_PROCESS_TIMEOUT:
+		break;
+	default:
+		unreachable();
+	}
+
+	txn_limbo_assign_lsn(&txn_limbo, entry, fake_lsn);
+	txn_limbo_wait_complete(&txn_limbo, entry);
+
+	switch (process_type) {
+	case TXN_PROCESS_COMMIT:
+		txn_commit(txn);
+		break;
+	case TXN_PROCESS_ROLLBACK:
+		txn_rollback(txn);
+		break;
+	case TXN_PROCESS_TIMEOUT:
+		fiber_yield();
+		break;
+	default:
+		unreachable();
+	}
+	return 0;
+}
+
+int
+txn_confirm_func(va_list ap)
+{
+	/*
+	 * We shouldn't react on gc_wait_cleanup() yield
+	 * inside gc_checkpoint().
+	 */
+	fiber_sleep(0);
+	txn_limbo_ack(&txn_limbo, relay_id, fake_lsn);
+	return 0;
+}
+
+
+void
+test_snap_delay_common(enum process_type process_type)
+{
+	plan(1);
+
+	/*
+	 * We need to clear the limbo vclock before the new test
+	 * variation because the same fake lsn will be used.
+	 */
+	vclock_clear(&txn_limbo.vclock);
+	vclock_create(&txn_limbo.vclock);
+
+	bool check_trg = false;
+	struct fiber *txn_fiber = fiber_new("txn_fiber", txn_process_func);
+	fiber_start(txn_fiber, &check_trg, process_type);
+
+	struct fiber *confirm_entry = fiber_new("confirm_fiber",
+						txn_confirm_func);
+	fiber_wakeup(confirm_entry);
+
+	switch (process_type) {
+	case TXN_PROCESS_COMMIT:
+		ok(gc_checkpoint() == 0 && check_trg,
+		   "check snapshot delay confirm");
+		break;
+	case TXN_PROCESS_ROLLBACK:
+		ok(gc_checkpoint() == -1 && check_trg,
+		   "check snapshot delay rollback");
+		break;
+	case TXN_PROCESS_TIMEOUT:
+		ok(gc_checkpoint() == -1, "check snapshot delay timeout");
+		/* join the "hung" fiber */
+		fiber_set_joinable(txn_fiber, true);
+		fiber_cancel(txn_fiber);
+		fiber_join(txn_fiber);
+		break;
+	default:
+		unreachable();
+	}
+	check_plan();
+}
+
+void
+test_snap_delay_timeout()
+{
+	/* Set the timeout to a small value for the test. */
+	replication_synchro_timeout = 0.01;
+	test_snap_delay_common(TXN_PROCESS_TIMEOUT);
+}
+
+int
+test_snap_delay(va_list ap)
+{
+	header();
+	plan(3);
+	(void)ap;
+	replication_synchro_quorum = 2;
+
+	test_snap_delay_common(TXN_PROCESS_COMMIT);
+	test_snap_delay_common(TXN_PROCESS_ROLLBACK);
+	test_snap_delay_timeout();
+
+	ev_break(loop(), EVBREAK_ALL);
+	footer();
+	test_result = check_plan();
+	return 0;
+}
+} /* end of anonymous namespace */
+
+int
+main(void)
+{
+	memory_init();
+	fiber_init(fiber_c_invoke);
+	gc_init();
+	txn_limbo_init();
+
+	struct fiber *main_fiber = fiber_new("main", test_snap_delay);
+	assert(main_fiber != NULL);
+	fiber_wakeup(main_fiber);
+	ev_run(loop(), 0);
+
+	gc_free();
+	fiber_free();
+	memory_free();
+	return test_result;
+}
diff --git a/test/unit/snap_quorum_delay.result b/test/unit/snap_quorum_delay.result
new file mode 100644
index 000000000..6ca213391
--- /dev/null
+++ b/test/unit/snap_quorum_delay.result
@@ -0,0 +1,12 @@
+	*** test_snap_delay ***
+1..3
+    1..1
+    ok 1 - check snapshot delay confirm
+ok 1 - subtests
+    1..1
+    ok 1 - check snapshot delay rollback
+ok 2 - subtests
+    1..1
+    ok 1 - check snapshot delay timeout
+ok 3 - subtests
+	*** test_snap_delay: done ***
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 09/19] txn_limbo: add timeout when waiting for acks.
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (17 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 08/19] replication: add support of qsync to the snapshot machinery Vladislav Shpilevoy
@ 2020-06-29 23:15   ` Vladislav Shpilevoy
  2020-06-29 23:22   ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (8 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:15 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

From: Serge Petrenko <sergepetrenko@tarantool.org>

Now txn_limbo_wait_complete() waits for acks only for txn_limbo_confirm_timeout
seconds. If a timeout is reached, the entry and all the ones following
it must be rolled back.

Part-of #4848
---
 src/box/txn_limbo.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index bee8e8155..ac57fd1bd 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -130,12 +130,13 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 		return;
 	}
 	bool cancellable = fiber_set_cancellable(false);
-	while (!txn_limbo_entry_is_complete(entry))
-		fiber_yield();
+	bool timed_out = fiber_yield_timeout(txn_limbo_confirm_timeout(limbo));
 	fiber_set_cancellable(cancellable);
-	// TODO: implement rollback.
-	assert(!entry->is_rollback);
-	assert(entry->is_commit);
+	if (timed_out) {
+		// TODO: implement rollback.
+		entry->is_rollback = true;
+	}
+	assert(txn_limbo_entry_is_complete(entry));
 	txn_limbo_remove(limbo, entry);
 	txn_clear_flag(txn, TXN_WAIT_ACK);
 }
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 00/19] Sync replication
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (18 preceding siblings ...)
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 09/19] txn_limbo: add timeout when waiting for acks Vladislav Shpilevoy
@ 2020-06-29 23:22   ` Vladislav Shpilevoy
  2020-06-30 23:00   ` [Tarantool-patches] [PATCH v2 20/19] replication: add test for quorum 1 Vladislav Shpilevoy
                     ` (7 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-29 23:22 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Sergey, please, pay special attention to these commits

    [14/19] applier: remove writer_cond -
        new commit;

    [15/19] applier: send heartbeat not only on commit, but on any write -
        not new, but significantly reworked;

    [19/19] replication: block async transactions when not empty limbo -
        new commit;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 01/19] replication: introduce space.is_sync option
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 01/19] replication: introduce space.is_sync option Vladislav Shpilevoy
@ 2020-06-30 23:00     ` Vladislav Shpilevoy
  2020-07-01 15:55       ` Sergey Ostanevich
  2020-07-02  8:25       ` Serge Petrenko
  0 siblings, 2 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-30 23:00 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

On 30/06/2020 01:15, Vladislav Shpilevoy wrote:
> Synchronous space makes every transaction, affecting its data,
> wait until it is replicated on a quorum of replicas before it is
> committed.
> 
> Part of #4844
> Part of #5073
> ---
>  src/box/alter.cc                              |  5 ++
>  src/box/lua/net_box.lua                       |  2 +
>  src/box/lua/schema.lua                        |  3 +
>  src/box/lua/space.cc                          |  5 ++
>  src/box/space_def.c                           |  2 +
>  src/box/space_def.h                           |  6 ++
>  .../sync_replication_sanity.result            | 71 +++++++++++++++++++
>  .../sync_replication_sanity.test.lua          | 29 ++++++++

Renamed to qsync_basic.test.lua. The old name was too long, and wasn't
about just sanity checks, which are usually extremely simple.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 04/19] replication: make sync transactions wait quorum
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 04/19] replication: make sync transactions wait quorum Vladislav Shpilevoy
@ 2020-06-30 23:00     ` Vladislav Shpilevoy
  2020-07-02  8:48     ` Serge Petrenko
  2020-07-05 16:05     ` Vladislav Shpilevoy
  2 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-30 23:00 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Merged this diff into the commit, in scope of
https://github.com/tarantool/tarantool/issues/5123.

Key difference is that txn_limbo_assign_lsn() has nothing to
do with ACKs now. Indeed, it is called even on replica, it
should not care about ACKs.

The local WAL write on master now is marked as one of the ACKs,
explicitly, by calling txn_limbo_ack().

Also ther was a bug, that in case the transaction is already
complete at the moment of calling txn_limbo_wait_complete(), it
didn't do proper cleanup. In this patch it does not matter, but
in the later patches becomes a bug. This is solved by making
one place where this funtion always ends and using goto to it.

====================
diff --git a/src/box/txn.c b/src/box/txn.c
index 6cfa98212..eaba90274 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -702,8 +702,10 @@ txn_commit(struct txn *txn)
 		return -1;
 	}
 	if (is_sync) {
-		txn_limbo_assign_lsn(&txn_limbo, limbo_entry,
-				     req->rows[req->n_rows - 1]->lsn);
+		int64_t lsn = req->rows[req->n_rows - 1]->lsn;
+		txn_limbo_assign_lsn(&txn_limbo, limbo_entry, lsn);
+		/* Local WAL write is a first 'ACK'. */
+		txn_limbo_ack(&txn_limbo, txn_limbo.instance_id, lsn);
 		txn_limbo_wait_complete(&txn_limbo, limbo_entry);
 	}
 	if (!txn_has_flag(txn, TXN_IS_DONE)) {
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 9de91db93..c9bb2b7ff 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -96,9 +96,9 @@ txn_limbo_assign_lsn(struct txn_limbo *limbo, struct txn_limbo_entry *entry,
 		     int64_t lsn)
 {
 	assert(limbo->instance_id != REPLICA_ID_NIL);
+	assert(entry->lsn == -1);
+	assert(lsn > 0);
 	entry->lsn = lsn;
-	++entry->ack_count;
-	vclock_follow(&limbo->vclock, limbo->instance_id, lsn);
 }
 
 static bool
@@ -125,14 +125,13 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 	assert(entry->lsn > 0);
 	assert(!txn_has_flag(txn, TXN_IS_DONE));
 	assert(txn_has_flag(txn, TXN_WAIT_ACK));
-	if (txn_limbo_check_complete(limbo, entry)) {
-		txn_limbo_remove(limbo, entry);
-		return;
-	}
+	if (txn_limbo_check_complete(limbo, entry))
+		goto complete;
 	bool cancellable = fiber_set_cancellable(false);
 	while (!txn_limbo_entry_is_complete(entry))
 		fiber_yield();
 	fiber_set_cancellable(cancellable);
+complete:
 	// TODO: implement rollback.
 	// TODO: implement confirm.
 	assert(!entry->is_rollback);

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH v2 20/19] replication: add test for quorum 1
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (19 preceding siblings ...)
  2020-06-29 23:22   ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
@ 2020-06-30 23:00   ` Vladislav Shpilevoy
  2020-07-03 12:32     ` Serge Petrenko
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 1/4] replication: regression test on gh-5119 [not fixed] sergeyb
                     ` (6 subsequent siblings)
  27 siblings, 1 reply; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-30 23:00 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

When synchro quorum is 1, the final commit and confirmation write
are done by the fiber created the transaction, right after WAL
write. This case got special handling in the previous patches,
and this commits adds a test for that.

Closes #5123
---
 test/replication/qsync_basic.result    |  33 +++++++
 test/replication/qsync_basic.test.lua  |  12 +++
 test/replication/qsync_errinj.result   | 114 +++++++++++++++++++++++++
 test/replication/qsync_errinj.test.lua |  45 ++++++++++
 test/replication/suite.ini             |   2 +-
 5 files changed, 205 insertions(+), 1 deletion(-)
 create mode 100644 test/replication/qsync_errinj.result
 create mode 100644 test/replication/qsync_errinj.test.lua

diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
index f713d4b08..cdecf00e8 100644
--- a/test/replication/qsync_basic.result
+++ b/test/replication/qsync_basic.result
@@ -299,6 +299,39 @@ box.space.sync:select{6}
  | - []
  | ...
 
+--
+-- gh-5123: quorum 1 still should write CONFIRM.
+--
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum = 1, replication_synchro_timeout = 5}
+ | ---
+ | ...
+oldlsn = box.info.lsn
+ | ---
+ | ...
+box.space.sync:replace{7}
+ | ---
+ | - [7]
+ | ...
+newlsn = box.info.lsn
+ | ---
+ | ...
+assert(newlsn >= oldlsn + 2)
+ | ---
+ | - true
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{7}
+ | ---
+ | - - [7]
+ | ...
+
 -- Cleanup.
 test_run:cmd('switch default')
  | ---
diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
index f84b6ee19..361f22bc3 100644
--- a/test/replication/qsync_basic.test.lua
+++ b/test/replication/qsync_basic.test.lua
@@ -118,6 +118,18 @@ test_run:switch('replica')
 box.space.test:select{6}
 box.space.sync:select{6}
 
+--
+-- gh-5123: quorum 1 still should write CONFIRM.
+--
+test_run:switch('default')
+box.cfg{replication_synchro_quorum = 1, replication_synchro_timeout = 5}
+oldlsn = box.info.lsn
+box.space.sync:replace{7}
+newlsn = box.info.lsn
+assert(newlsn >= oldlsn + 2)
+test_run:switch('replica')
+box.space.sync:select{7}
+
 -- Cleanup.
 test_run:cmd('switch default')
 
diff --git a/test/replication/qsync_errinj.result b/test/replication/qsync_errinj.result
new file mode 100644
index 000000000..1d2945761
--- /dev/null
+++ b/test/replication/qsync_errinj.result
@@ -0,0 +1,114 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+
+old_synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+old_synchro_timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+box.schema.user.grant('guest', 'super')
+ | ---
+ | ...
+
+test_run:cmd('create server replica with rpl_master=default,\
+             script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+
+--
+-- gh-5123: replica WAL fail shouldn't crash with quorum 1.
+--
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum = 1, replication_synchro_timeout = 5}
+ | ---
+ | ...
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.error.injection.set('ERRINJ_WAL_IO', true)
+ | ---
+ | - ok
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{2}
+ | ---
+ | - [2]
+ | ...
+
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_upstream(1, {status='stopped'})
+ | ---
+ | - true
+ | ...
+box.error.injection.set('ERRINJ_WAL_IO', false)
+ | ---
+ | - ok
+ | ...
+
+test_run:cmd('restart server replica')
+ | 
+box.space.sync:select{2}
+ | ---
+ | - - [2]
+ | ...
+
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+
+box.cfg{                                                                        \
+    replication_synchro_quorum = old_synchro_quorum,                            \
+    replication_synchro_timeout = old_synchro_timeout,                          \
+}
+ | ---
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+
+box.space.sync:drop()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'super')
+ | ---
+ | ...
diff --git a/test/replication/qsync_errinj.test.lua b/test/replication/qsync_errinj.test.lua
new file mode 100644
index 000000000..96495ae6c
--- /dev/null
+++ b/test/replication/qsync_errinj.test.lua
@@ -0,0 +1,45 @@
+test_run = require('test_run').new()
+engine = test_run:get_cfg('engine')
+
+old_synchro_quorum = box.cfg.replication_synchro_quorum
+old_synchro_timeout = box.cfg.replication_synchro_timeout
+box.schema.user.grant('guest', 'super')
+
+test_run:cmd('create server replica with rpl_master=default,\
+             script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
+_ = box.space.sync:create_index('pk')
+
+--
+-- gh-5123: replica WAL fail shouldn't crash with quorum 1.
+--
+test_run:switch('default')
+box.cfg{replication_synchro_quorum = 1, replication_synchro_timeout = 5}
+box.space.sync:insert{1}
+
+test_run:switch('replica')
+box.error.injection.set('ERRINJ_WAL_IO', true)
+
+test_run:switch('default')
+box.space.sync:insert{2}
+
+test_run:switch('replica')
+test_run:wait_upstream(1, {status='stopped'})
+box.error.injection.set('ERRINJ_WAL_IO', false)
+
+test_run:cmd('restart server replica')
+box.space.sync:select{2}
+
+test_run:cmd('switch default')
+
+box.cfg{                                                                        \
+    replication_synchro_quorum = old_synchro_quorum,                            \
+    replication_synchro_timeout = old_synchro_timeout,                          \
+}
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+
+box.space.sync:drop()
+box.schema.user.revoke('guest', 'super')
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index 6119a264b..11f8d4e20 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -3,7 +3,7 @@ core = tarantool
 script =  master.lua
 description = tarantool/box, replication
 disabled = consistent.test.lua
-release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua
+release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua
 config = suite.cfg
 lua_libs = lua/fast_replica.lua lua/rlimit.lua
 use_unix_sockets = True
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 01/19] replication: introduce space.is_sync option
  2020-06-30 23:00     ` Vladislav Shpilevoy
@ 2020-07-01 15:55       ` Sergey Ostanevich
  2020-07-01 23:46         ` Vladislav Shpilevoy
  2020-07-02  8:25       ` Serge Petrenko
  1 sibling, 1 reply; 68+ messages in thread
From: Sergey Ostanevich @ 2020-07-01 15:55 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi!

Thanks for the patch, LGTM.
Please, create corresponding doc tracker as we agreed to do it
independently.

Regards,
Sergos

On 01 Jul 01:00, Vladislav Shpilevoy wrote:
> On 30/06/2020 01:15, Vladislav Shpilevoy wrote:
> > Synchronous space makes every transaction, affecting its data,
> > wait until it is replicated on a quorum of replicas before it is
> > committed.
> > 
> > Part of #4844
> > Part of #5073
> > ---
> >  src/box/alter.cc                              |  5 ++
> >  src/box/lua/net_box.lua                       |  2 +
> >  src/box/lua/schema.lua                        |  3 +
> >  src/box/lua/space.cc                          |  5 ++
> >  src/box/space_def.c                           |  2 +
> >  src/box/space_def.h                           |  6 ++
> >  .../sync_replication_sanity.result            | 71 +++++++++++++++++++
> >  .../sync_replication_sanity.test.lua          | 29 ++++++++
> 
> Renamed to qsync_basic.test.lua. The old name was too long, and wasn't
> about just sanity checks, which are usually extremely simple.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 02/19] replication: introduce replication_synchro_* cfg options
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 02/19] replication: introduce replication_synchro_* cfg options Vladislav Shpilevoy
@ 2020-07-01 16:05     ` Sergey Ostanevich
  2020-07-01 23:46       ` Vladislav Shpilevoy
  2020-07-02  8:29     ` Serge Petrenko
  1 sibling, 1 reply; 68+ messages in thread
From: Sergey Ostanevich @ 2020-07-01 16:05 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi!

Thanks for the patch! 

LGTM, just one nit in desc.

Regards,
Sergos

On 30 Jun 01:15, Vladislav Shpilevoy wrote:
> Synchronous transactions are supposed to be replicated on a
> specified number of replicas before committed on master. The
> number of replicas can be specified using
> replication_synchro_quorum option. It is 1 by default, so sync
> transactions work like asynchronous when not configured anyhow.
> 1 means successful WAL write on master is enough for commit.
> 
> When replication_synchro_quorum is greater than 1, an instance has to
> wait for the specified number of replicas to  reply with success. If
double space here - - - - - - - -- - - ----- -^
> enough replies aren't collected during replication_synchro_timeout,
> the instance rolls back the tx in question.
> 
> Part of #4844
> Part of #5073
> ---
>  src/box/box.cc                  | 53 +++++++++++++++++++++++++++++++++
>  src/box/box.h                   |  2 ++
>  src/box/lua/cfg.cc              | 18 +++++++++++
>  src/box/lua/load_cfg.lua        | 10 +++++++
>  src/box/replication.cc          |  2 ++
>  src/box/replication.h           | 12 ++++++++
>  test/app-tap/init_script.result |  2 ++
>  test/box/admin.result           |  4 +++
>  test/box/cfg.result             |  8 +++++
>  9 files changed, 111 insertions(+)
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 871b0d976..0821ea0a3 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -476,6 +476,31 @@ box_check_replication_sync_lag(void)
>  	return lag;
>  }
>  
> +static int
> +box_check_replication_synchro_quorum(void)
> +{
> +	int quorum = cfg_geti("replication_synchro_quorum");
> +	if (quorum <= 0 || quorum > VCLOCK_MAX) {
> +		diag_set(ClientError, ER_CFG, "replication_synchro_quorum",
> +			 "the value must be greater than zero and less than "
> +			 "maximal number of replicas");
> +		return -1;
> +	}
> +	return quorum;
> +}
> +
> +static double
> +box_check_replication_synchro_timeout(void)
> +{
> +	double timeout = cfg_getd("replication_synchro_timeout");
> +	if (timeout <= 0) {
> +		diag_set(ClientError, ER_CFG, "replication_synchro_timeout",
> +			 "the value must be greater than zero");
> +		return -1;
> +	}
> +	return timeout;
> +}
> +
>  static double
>  box_check_replication_sync_timeout(void)
>  {
> @@ -658,6 +683,10 @@ box_check_config()
>  	box_check_replication_connect_timeout();
>  	box_check_replication_connect_quorum();
>  	box_check_replication_sync_lag();
> +	if (box_check_replication_synchro_quorum() < 0)
> +		diag_raise();
> +	if (box_check_replication_synchro_timeout() < 0)
> +		diag_raise();
>  	box_check_replication_sync_timeout();
>  	box_check_readahead(cfg_geti("readahead"));
>  	box_check_checkpoint_count(cfg_geti("checkpoint_count"));
> @@ -777,6 +806,26 @@ box_set_replication_sync_lag(void)
>  	replication_sync_lag = box_check_replication_sync_lag();
>  }
>  
> +int
> +box_set_replication_synchro_quorum(void)
> +{
> +	int value = box_check_replication_synchro_quorum();
> +	if (value < 0)
> +		return -1;
> +	replication_synchro_quorum = value;
> +	return 0;
> +}
> +
> +int
> +box_set_replication_synchro_timeout(void)
> +{
> +	double value = box_check_replication_synchro_timeout();
> +	if (value < 0)
> +		return -1;
> +	replication_synchro_timeout = value;
> +	return 0;
> +}
> +
>  void
>  box_set_replication_sync_timeout(void)
>  {
> @@ -2417,6 +2466,10 @@ box_cfg_xc(void)
>  	box_set_replication_connect_timeout();
>  	box_set_replication_connect_quorum();
>  	box_set_replication_sync_lag();
> +	if (box_set_replication_synchro_quorum() != 0)
> +		diag_raise();
> +	if (box_set_replication_synchro_timeout() != 0)
> +		diag_raise();
>  	box_set_replication_sync_timeout();
>  	box_set_replication_skip_conflict();
>  	box_set_replication_anon();
> diff --git a/src/box/box.h b/src/box/box.h
> index 557542a83..f9789154e 100644
> --- a/src/box/box.h
> +++ b/src/box/box.h
> @@ -243,6 +243,8 @@ void box_set_replication_timeout(void);
>  void box_set_replication_connect_timeout(void);
>  void box_set_replication_connect_quorum(void);
>  void box_set_replication_sync_lag(void);
> +int box_set_replication_synchro_quorum(void);
> +int box_set_replication_synchro_timeout(void);
>  void box_set_replication_sync_timeout(void);
>  void box_set_replication_skip_conflict(void);
>  void box_set_replication_anon(void);
> diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc
> index a5b15e527..d481155cd 100644
> --- a/src/box/lua/cfg.cc
> +++ b/src/box/lua/cfg.cc
> @@ -313,6 +313,22 @@ lbox_cfg_set_replication_sync_lag(struct lua_State *L)
>  	return 0;
>  }
>  
> +static int
> +lbox_cfg_set_replication_synchro_quorum(struct lua_State *L)
> +{
> +	if (box_set_replication_synchro_quorum() != 0)
> +		luaT_error(L);
> +	return 0;
> +}
> +
> +static int
> +lbox_cfg_set_replication_synchro_timeout(struct lua_State *L)
> +{
> +	if (box_set_replication_synchro_timeout() != 0)
> +		luaT_error(L);
> +	return 0;
> +}
> +
>  static int
>  lbox_cfg_set_replication_sync_timeout(struct lua_State *L)
>  {
> @@ -370,6 +386,8 @@ box_lua_cfg_init(struct lua_State *L)
>  		{"cfg_set_replication_connect_quorum", lbox_cfg_set_replication_connect_quorum},
>  		{"cfg_set_replication_connect_timeout", lbox_cfg_set_replication_connect_timeout},
>  		{"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag},
> +		{"cfg_set_replication_synchro_quorum", lbox_cfg_set_replication_synchro_quorum},
> +		{"cfg_set_replication_synchro_timeout", lbox_cfg_set_replication_synchro_timeout},
>  		{"cfg_set_replication_sync_timeout", lbox_cfg_set_replication_sync_timeout},
>  		{"cfg_set_replication_skip_conflict", lbox_cfg_set_replication_skip_conflict},
>  		{"cfg_set_replication_anon", lbox_cfg_set_replication_anon},
> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
> index f2f2df6f8..a7f03c7d6 100644
> --- a/src/box/lua/load_cfg.lua
> +++ b/src/box/lua/load_cfg.lua
> @@ -89,6 +89,8 @@ local default_cfg = {
>      replication_timeout = 1,
>      replication_sync_lag = 10,
>      replication_sync_timeout = 300,
> +    replication_synchro_quorum = 1,
> +    replication_synchro_timeout = 5,
>      replication_connect_timeout = 30,
>      replication_connect_quorum = nil, -- connect all
>      replication_skip_conflict = false,
> @@ -164,6 +166,8 @@ local template_cfg = {
>      replication_timeout = 'number',
>      replication_sync_lag = 'number',
>      replication_sync_timeout = 'number',
> +    replication_synchro_quorum = 'number',
> +    replication_synchro_timeout = 'number',
>      replication_connect_timeout = 'number',
>      replication_connect_quorum = 'number',
>      replication_skip_conflict = 'boolean',
> @@ -280,6 +284,8 @@ local dynamic_cfg = {
>      replication_connect_quorum = private.cfg_set_replication_connect_quorum,
>      replication_sync_lag    = private.cfg_set_replication_sync_lag,
>      replication_sync_timeout = private.cfg_set_replication_sync_timeout,
> +    replication_synchro_quorum = private.cfg_set_replication_synchro_quorum,
> +    replication_synchro_timeout = private.cfg_set_replication_synchro_timeout,
>      replication_skip_conflict = private.cfg_set_replication_skip_conflict,
>      replication_anon        = private.cfg_set_replication_anon,
>      instance_uuid           = check_instance_uuid,
> @@ -313,6 +319,8 @@ local dynamic_cfg_order = {
>      replication_timeout     = 150,
>      replication_sync_lag    = 150,
>      replication_sync_timeout    = 150,
> +    replication_synchro_quorum  = 150,
> +    replication_synchro_timeout = 150,
>      replication_connect_timeout = 150,
>      replication_connect_quorum  = 150,
>      replication             = 200,
> @@ -348,6 +356,8 @@ local dynamic_cfg_skip_at_load = {
>      replication_connect_quorum = true,
>      replication_sync_lag    = true,
>      replication_sync_timeout = true,
> +    replication_synchro_quorum = true,
> +    replication_synchro_timeout = true,
>      replication_skip_conflict = true,
>      replication_anon        = true,
>      wal_dir_rescan_delay    = true,
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index 273a7cb66..01e9e876a 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -51,6 +51,8 @@ double replication_timeout = 1.0; /* seconds */
>  double replication_connect_timeout = 30.0; /* seconds */
>  int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL;
>  double replication_sync_lag = 10.0; /* seconds */
> +int replication_synchro_quorum = 1;
> +double replication_synchro_timeout = 5.0; /* seconds */
>  double replication_sync_timeout = 300.0; /* seconds */
>  bool replication_skip_conflict = false;
>  bool replication_anon = false;
> diff --git a/src/box/replication.h b/src/box/replication.h
> index 93a25c8a7..a081870f9 100644
> --- a/src/box/replication.h
> +++ b/src/box/replication.h
> @@ -125,6 +125,18 @@ extern int replication_connect_quorum;
>   */
>  extern double replication_sync_lag;
>  
> +/**
> + * Minimal number of replicas which should ACK a synchronous
> + * transaction to be able to confirm it and commit.
> + */
> +extern int replication_synchro_quorum;
> +
> +/**
> + * Time in seconds which the master node is able to wait for ACKs
> + * for a synchronous transaction until it is rolled back.
> + */
> +extern double replication_synchro_timeout;
> +
>  /**
>   * Max time to wait for appliers to synchronize before entering
>   * the orphan mode.
> diff --git a/test/app-tap/init_script.result b/test/app-tap/init_script.result
> index 7c4454285..857f0c95f 100644
> --- a/test/app-tap/init_script.result
> +++ b/test/app-tap/init_script.result
> @@ -30,6 +30,8 @@ replication_connect_timeout:30
>  replication_skip_conflict:false
>  replication_sync_lag:10
>  replication_sync_timeout:300
> +replication_synchro_quorum:1
> +replication_synchro_timeout:5
>  replication_timeout:1
>  slab_alloc_factor:1.05
>  sql_cache_size:5242880
> diff --git a/test/box/admin.result b/test/box/admin.result
> index d94da8c5d..ab3e80a97 100644
> --- a/test/box/admin.result
> +++ b/test/box/admin.result
> @@ -81,6 +81,10 @@ cfg_filter(box.cfg)
>      - 10
>    - - replication_sync_timeout
>      - 300
> +  - - replication_synchro_quorum
> +    - 1
> +  - - replication_synchro_timeout
> +    - 5
>    - - replication_timeout
>      - 1
>    - - slab_alloc_factor
> diff --git a/test/box/cfg.result b/test/box/cfg.result
> index b41d54599..bdd210b09 100644
> --- a/test/box/cfg.result
> +++ b/test/box/cfg.result
> @@ -69,6 +69,10 @@ cfg_filter(box.cfg)
>   |     - 10
>   |   - - replication_sync_timeout
>   |     - 300
> + |   - - replication_synchro_quorum
> + |     - 1
> + |   - - replication_synchro_timeout
> + |     - 5
>   |   - - replication_timeout
>   |     - 1
>   |   - - slab_alloc_factor
> @@ -172,6 +176,10 @@ cfg_filter(box.cfg)
>   |     - 10
>   |   - - replication_sync_timeout
>   |     - 300
> + |   - - replication_synchro_quorum
> + |     - 1
> + |   - - replication_synchro_timeout
> + |     - 5
>   |   - - replication_timeout
>   |     - 1
>   |   - - slab_alloc_factor
> -- 
> 2.21.1 (Apple Git-122.3)
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 19/19] replication: block async transactions when not empty limbo
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 19/19] replication: block async transactions when not empty limbo Vladislav Shpilevoy
@ 2020-07-01 17:12     ` Sergey Ostanevich
  2020-07-01 23:47       ` Vladislav Shpilevoy
  2020-07-03 12:28     ` Serge Petrenko
  1 sibling, 1 reply; 68+ messages in thread
From: Sergey Ostanevich @ 2020-07-01 17:12 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi!

Thanks for the patch!

I would like to see it merged with 03 and 04 of this series. It's no
good to have a self-rewrite in the same patchset, isn't it?

Plus some nits. 

Sergos

On 30 Jun 01:15, Vladislav Shpilevoy wrote:
> When there is a not committed synchronous transaction, any attempt
             are transactions it the limbo

> to commit a next transaction should be suspended, even if it is an
> async transaction.
> 
> This restriction comes from the theoretically possible dependency
> of what is written in the async transactions on what was written
> in the previous sync transactions.
> 
> For that there is a new txn flag - TXN_WAIT_SYNC. Previously the
> only synchro replication flag was TXN_WAIT_ACK. And now a
> transaction can be sync, but not wait for ACKs.

This one took some time for me to grasp - you say an async transaction
can be sync, which put me to a stop. I believe if join this with ACK
description it will become clear.

> 
> In particular, if a transaction:
> 
> - Is synchronous, the it has TXN_WAIT_SYNC (it is sync), and
                    ^^^^^^
>   TXN_WAIT_ACK (need to collect ACKs, or get a CONFIRM);
> 
> - Is asynchronous, and the limbo was empty and the moment of
>   commit, the it does not have any of these flags and committed
>   like earlier;
> 
> - Is asynchronous, and the limbo was not empty and the moment of
>   commit. Then it will have only TXN_WAIT_SYNC. So it will be
>   finished right after all the previous sync transactions are
>   done. Note: *without waiting for ACKs* - the transaction is
>   still asynchronous in a sense that it is don't need to wait for
>   quorum replication.

So, there should be no TXN_WAIT_ACK set without the TXN_WAIT_SYNC. Is
there an assertion for this?

> 
> Follow-up #4845
> ---
>  src/box/applier.cc                            |  8 ++
>  src/box/txn.c                                 | 16 ++--
>  src/box/txn.h                                 |  7 ++
>  src/box/txn_limbo.c                           | 49 +++++++++---
>  .../sync_replication_sanity.result            | 75 +++++++++++++++++++
>  .../sync_replication_sanity.test.lua          | 26 +++++++
>  test/unit/snap_quorum_delay.cc                |  6 +-
>  7 files changed, 172 insertions(+), 15 deletions(-)
> 
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 7e63dc544..7e70211b7 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -280,6 +280,14 @@ process_confirm_rollback(struct request *request, bool is_confirm)
>  			 txn_limbo.instance_id);
>  		return -1;
>  	}
> +	assert(txn->n_applier_rows == 0);
> +	/*
> +	 * This is not really a transaction. It just uses txn API
> +	 * to put the data into WAL. And obviously it should not
> +	 * go to the limbo and block on the very same sync
> +	 * transaction which it tries to confirm now.
> +	 */
> +	txn_set_flag(txn, TXN_FORCE_ASYNC);
>  
>  	if (txn_begin_stmt(txn, NULL) != 0)
>  		return -1;
> diff --git a/src/box/txn.c b/src/box/txn.c
> index 37955752a..bc2bb8e11 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -442,7 +442,7 @@ txn_complete(struct txn *txn)
>  			engine_rollback(txn->engine, txn);
>  		if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
>  			txn_run_rollback_triggers(txn, &txn->on_rollback);
> -	} else if (!txn_has_flag(txn, TXN_WAIT_ACK)) {
> +	} else if (!txn_has_flag(txn, TXN_WAIT_SYNC)) {
>  		/* Commit the transaction. */
>  		if (txn->engine != NULL)
>  			engine_commit(txn->engine, txn);
> @@ -552,8 +552,14 @@ txn_journal_entry_new(struct txn *txn)
>  	 * space can't be synchronous. So if there is at least one
>  	 * synchronous space, the transaction is not local.
>  	 */
> -	if (is_sync && !txn_has_flag(txn, TXN_FORCE_ASYNC))
> -		txn_set_flag(txn, TXN_WAIT_ACK);
> +	if (!txn_has_flag(txn, TXN_FORCE_ASYNC)) {
> +		if (is_sync) {
> +			txn_set_flag(txn, TXN_WAIT_SYNC);
> +			txn_set_flag(txn, TXN_WAIT_ACK);
> +		} else if (!txn_limbo_is_empty(&txn_limbo)) {
> +			txn_set_flag(txn, TXN_WAIT_SYNC);
> +		}
> +	}
>  
>  	assert(remote_row == req->rows + txn->n_applier_rows);
>  	assert(local_row == remote_row + txn->n_new_rows);
> @@ -662,7 +668,7 @@ txn_commit_async(struct txn *txn)
>  		return -1;
>  	}
>  
> -	bool is_sync = txn_has_flag(txn, TXN_WAIT_ACK);
> +	bool is_sync = txn_has_flag(txn, TXN_WAIT_SYNC);
>  	struct txn_limbo_entry *limbo_entry;
>  	if (is_sync) {
>  		/*
> @@ -737,7 +743,7 @@ txn_commit(struct txn *txn)
>  		return -1;
>  	}
>  
> -	bool is_sync = txn_has_flag(txn, TXN_WAIT_ACK);
> +	bool is_sync = txn_has_flag(txn, TXN_WAIT_SYNC);
>  	if (is_sync) {
>  		/*
>  		 * Remote rows, if any, come before local rows, so
> diff --git a/src/box/txn.h b/src/box/txn.h
> index c631d7033..c484fcb56 100644
> --- a/src/box/txn.h
> +++ b/src/box/txn.h
> @@ -66,11 +66,18 @@ enum txn_flag {
>  	TXN_CAN_YIELD,
>  	/** on_commit and/or on_rollback list is not empty. */
>  	TXN_HAS_TRIGGERS,
> +	/**
> +	 * A transaction is either synchronous itself and needs to
> +	 * be synced with replicas, or it is async, but is blocked
> +	 * by a not yet finished synchronous transaction.
> +	 */
> +	TXN_WAIT_SYNC,
>  	/**
>  	 * Transaction, touched sync spaces, enters 'waiting for
>  	 * acks' state before commit. In this state it waits until
>  	 * it is replicated onto a quorum of replicas, and only
>  	 * then finishes commit and returns success to a user.
> +	 * TXN_WAIT_SYNC is always set, if TXN_WAIT_ACK is set.
>  	 */
>  	TXN_WAIT_ACK,
>  	/**
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index fbe4dcecf..bfb404e8e 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -47,7 +47,7 @@ txn_limbo_create(struct txn_limbo *limbo)
>  struct txn_limbo_entry *
>  txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
>  {
> -	assert(txn_has_flag(txn, TXN_WAIT_ACK));
> +	assert(txn_has_flag(txn, TXN_WAIT_SYNC));
>  	if (id == 0)
>  		id = instance_id;
>  	if (limbo->instance_id != id) {
> @@ -143,7 +143,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
>  	struct txn *txn = entry->txn;
>  	assert(entry->lsn > 0);
>  	assert(!txn_has_flag(txn, TXN_IS_DONE));
> -	assert(txn_has_flag(txn, TXN_WAIT_ACK));
> +	assert(txn_has_flag(txn, TXN_WAIT_SYNC));
>  	if (txn_limbo_check_complete(limbo, entry)) {
>  		txn_limbo_remove(limbo, entry);
>  		return 0;
> @@ -160,6 +160,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
>  			e->txn->signature = TXN_SIGNATURE_QUORUM_TIMEOUT;
>  			txn_limbo_pop(limbo, e);
>  			txn_clear_flag(e->txn, TXN_WAIT_ACK);
> +			txn_clear_flag(e->txn, TXN_WAIT_SYNC);
>  			txn_complete(e->txn);
>  			if (e == entry)
>  				break;
> @@ -179,6 +180,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
>  	}
>  	txn_limbo_remove(limbo, entry);
>  	txn_clear_flag(txn, TXN_WAIT_ACK);
> +	txn_clear_flag(txn, TXN_WAIT_SYNC);
>  	return 0;
>  }
>  
> @@ -209,6 +211,13 @@ txn_limbo_write_confirm_rollback(struct txn_limbo *limbo,
>  	struct txn *txn = txn_begin();
>  	if (txn == NULL)
>  		return -1;
> +	/*
> +	 * This is not really a transaction. It just uses txn API
> +	 * to put the data into WAL. And obviously it should not
> +	 * go to the limbo and block on the very same sync
> +	 * transaction which it tries to confirm now.
> +	 */
> +	txn_set_flag(txn, TXN_FORCE_ASYNC);
>  
>  	if (txn_begin_stmt(txn, NULL) != 0)
>  		goto rollback;
> @@ -238,11 +247,21 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
>  	assert(limbo->instance_id != REPLICA_ID_NIL);
>  	struct txn_limbo_entry *e, *tmp;
>  	rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
> -		if (e->lsn > lsn)
> +		/*
> +		 * Confirm a transaction if
> +		 * - it is a sync transaction covered by the
> +		 *   confirmation LSN;
> +		 * - it is an async transaction, and it is the
> +		 *   last in the queue. So it does not depend on
> +		 *   a not finished sync transaction anymore and
                       ------------  waiting for ACK^------- 

> +		 *   can be confirmed too.
> +		 */
> +		if (e->lsn > lsn && txn_has_flag(e->txn, TXN_WAIT_ACK))
>  			break;
>  		e->is_commit = true;
>  		txn_limbo_remove(limbo, e);
>  		txn_clear_flag(e->txn, TXN_WAIT_ACK);
> +		txn_clear_flag(e->txn, TXN_WAIT_SYNC);
>  		/*
>  		 * If  txn_complete_async() was already called,
>  		 * finish tx processing. Otherwise just clear the
> @@ -277,6 +296,7 @@ txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
>  		e->is_rollback = true;
>  		txn_limbo_pop(limbo, e);
>  		txn_clear_flag(e->txn, TXN_WAIT_ACK);
> +		txn_clear_flag(e->txn, TXN_WAIT_SYNC);
>  		if (e->txn->signature >= 0) {
>  			/* Rollback the transaction. */
>  			e->txn->signature = TXN_SIGNATURE_SYNC_ROLLBACK;
> @@ -307,15 +327,26 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
>  	struct txn_limbo_entry *e;
>  	struct txn_limbo_entry *last_quorum = NULL;
>  	rlist_foreach_entry(e, &limbo->queue, in_queue) {
> -		if (e->lsn <= prev_lsn)
> -			continue;
>  		if (e->lsn > lsn)
>  			break;
> -		if (++e->ack_count >= replication_synchro_quorum) {
> -			e->is_commit = true;
> -			last_quorum = e;
> -		}
> +		if (e->lsn <= prev_lsn)
> +			continue;
>  		assert(e->ack_count <= VCLOCK_MAX);
> +		/*
> +		 * Sync transactions need to collect acks. Async
> +		 * transactions are automatically committed right
> +		 * after all the previous sync transactions are.
> +		 */
> +		if (txn_has_flag(e->txn, TXN_WAIT_ACK)) {
> +			if (++e->ack_count < replication_synchro_quorum)
> +				continue;
> +		} else {
> +			assert(txn_has_flag(e->txn, TXN_WAIT_SYNC));
> +			if (last_quorum == NULL)
> +				continue;
> +		}
> +		e->is_commit = true;
> +		last_quorum = e;
>  	}
>  	if (last_quorum != NULL) {
>  		if (txn_limbo_write_confirm(limbo, last_quorum) != 0) {
> diff --git a/test/replication/sync_replication_sanity.result b/test/replication/sync_replication_sanity.result
> index 8b37ba6f5..f713d4b08 100644
> --- a/test/replication/sync_replication_sanity.result
> +++ b/test/replication/sync_replication_sanity.result
> @@ -224,6 +224,81 @@ box.space.sync:select{4}
>   | - - [4]
>   | ...
>  
> +--
> +-- Async transactions should wait for existing sync transactions
> +-- finish.
> +--
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +-- Start 2 fibers, which will execute one right after the other
> +-- in the same event loop iteration.
> +f = fiber.create(box.space.sync.replace, box.space.sync, {5}) s:replace{5}
> + | ---
> + | ...
> +f:status()
> + | ---
> + | - dead
> + | ...
> +s:select{5}
> + | ---
> + | - - [5]
> + | ...
> +box.space.sync:select{5}
> + | ---
> + | - - [5]
> + | ...
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.space.test:select{5}
> + | ---
> + | - - [5]
> + | ...
> +box.space.sync:select{5}
> + | ---
> + | - - [5]
> + | ...
> +-- Ensure sync rollback will affect all pending async transactions
> +-- too.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_timeout = 0.001, replication_synchro_quorum = 3}
> + | ---
> + | ...
> +f = fiber.create(box.space.sync.replace, box.space.sync, {6}) s:replace{6}
> + | ---
> + | - error: Quorum collection for a synchronous transaction is timed out
> + | ...
> +f:status()
> + | ---
> + | - dead
> + | ...
> +s:select{6}
> + | ---
> + | - []
> + | ...
> +box.space.sync:select{6}
> + | ---
> + | - []
> + | ...
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.space.test:select{6}
> + | ---
> + | - []
> + | ...
> +box.space.sync:select{6}
> + | ---
> + | - []
> + | ...
> +
>  -- Cleanup.
>  test_run:cmd('switch default')
>   | ---
> diff --git a/test/replication/sync_replication_sanity.test.lua b/test/replication/sync_replication_sanity.test.lua
> index b0326fd4b..f84b6ee19 100644
> --- a/test/replication/sync_replication_sanity.test.lua
> +++ b/test/replication/sync_replication_sanity.test.lua
> @@ -92,6 +92,32 @@ box.space.sync:replace{4}
>  test_run:switch('replica')
>  box.space.sync:select{4}
>  
> +--
> +-- Async transactions should wait for existing sync transactions
> +-- finish.
> +--
> +test_run:switch('default')
> +-- Start 2 fibers, which will execute one right after the other
> +-- in the same event loop iteration.
> +f = fiber.create(box.space.sync.replace, box.space.sync, {5}) s:replace{5}
> +f:status()
> +s:select{5}
> +box.space.sync:select{5}
> +test_run:switch('replica')
> +box.space.test:select{5}
> +box.space.sync:select{5}
> +-- Ensure sync rollback will affect all pending async transactions
> +-- too.
> +test_run:switch('default')
> +box.cfg{replication_synchro_timeout = 0.001, replication_synchro_quorum = 3}
> +f = fiber.create(box.space.sync.replace, box.space.sync, {6}) s:replace{6}
> +f:status()
> +s:select{6}
> +box.space.sync:select{6}
> +test_run:switch('replica')
> +box.space.test:select{6}
> +box.space.sync:select{6}
> +
>  -- Cleanup.
>  test_run:cmd('switch default')
>  
> diff --git a/test/unit/snap_quorum_delay.cc b/test/unit/snap_quorum_delay.cc
> index 7a200673a..e6cf381bf 100644
> --- a/test/unit/snap_quorum_delay.cc
> +++ b/test/unit/snap_quorum_delay.cc
> @@ -97,8 +97,12 @@ txn_process_func(va_list ap)
>  	enum process_type process_type = (enum process_type)va_arg(ap, int);
>  	struct txn *txn = txn_begin();
>  	txn->fiber = fiber();
> -	/* Set the TXN_WAIT_ACK flag to simulate a sync transaction.*/
> +	/*
> +	 * Set the TXN_WAIT_ACK + SYNC flags to simulate a sync
> +	 * transaction.
> +	 */
>  	txn_set_flag(txn, TXN_WAIT_ACK);
> +	txn_set_flag(txn, TXN_WAIT_SYNC);
>  	/*
>  	 * The true way to push the transaction to limbo is to call
>  	 * txn_commit() for sync transaction. But, if txn_commit()
> -- 
> 2.21.1 (Apple Git-122.3)
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 03/19] txn: add TXN_WAIT_ACK flag
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 03/19] txn: add TXN_WAIT_ACK flag Vladislav Shpilevoy
@ 2020-07-01 17:14     ` Sergey Ostanevich
  2020-07-01 23:46     ` Vladislav Shpilevoy
  2020-07-02  8:30     ` Serge Petrenko
  2 siblings, 0 replies; 68+ messages in thread
From: Sergey Ostanevich @ 2020-07-01 17:14 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi!

Thanks for the patch!

Need updates from 19/19, otherwise good.

Sergos.


On 30 Jun 01:15, Vladislav Shpilevoy wrote:
> When a transaction touches a synchronous space, its commit
> procedure changes. The transaction enters state of 'waiting for
> acks' from replicas and from own master's WAL.
> 
> To denote the state there is a new transaction flag -
> TXN_WAIT_ACK.
> 
> Part of #4844
> ---
>  src/box/txn.c | 6 ++++++
>  src/box/txn.h | 7 +++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/src/box/txn.c b/src/box/txn.c
> index 123520166..edc1f5180 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -495,6 +495,7 @@ txn_journal_entry_new(struct txn *txn)
>  
>  	struct xrow_header **remote_row = req->rows;
>  	struct xrow_header **local_row = req->rows + txn->n_applier_rows;
> +	bool is_sync = false;
>  
>  	stailq_foreach_entry(stmt, &txn->stmts, next) {
>  		if (stmt->has_triggers) {
> @@ -506,6 +507,9 @@ txn_journal_entry_new(struct txn *txn)
>  		if (stmt->row == NULL)
>  			continue;
>  
> +		is_sync = is_sync || (stmt->space != NULL &&
> +				      stmt->space->def->opts.is_sync);
> +
>  		if (stmt->row->replica_id == 0)
>  			*local_row++ = stmt->row;
>  		else
> @@ -513,6 +517,8 @@ txn_journal_entry_new(struct txn *txn)
>  
>  		req->approx_len += xrow_approx_len(stmt->row);
>  	}
> +	if (is_sync)
> +		txn_set_flag(txn, TXN_WAIT_ACK);
>  
>  	assert(remote_row == req->rows + txn->n_applier_rows);
>  	assert(local_row == remote_row + txn->n_new_rows);
> diff --git a/src/box/txn.h b/src/box/txn.h
> index 3f6d79d5c..232cc07a8 100644
> --- a/src/box/txn.h
> +++ b/src/box/txn.h
> @@ -66,6 +66,13 @@ enum txn_flag {
>  	TXN_CAN_YIELD,
>  	/** on_commit and/or on_rollback list is not empty. */
>  	TXN_HAS_TRIGGERS,
> +	/**
> +	 * Transaction, touched sync spaces, enters 'waiting for
> +	 * acks' state before commit. In this state it waits until
> +	 * it is replicated onto a quorum of replicas, and only
> +	 * then finishes commit and returns success to a user.
> +	 */
> +	TXN_WAIT_ACK,
>  };
>  
>  enum {
> -- 
> 2.21.1 (Apple Git-122.3)
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 01/19] replication: introduce space.is_sync option
  2020-07-01 15:55       ` Sergey Ostanevich
@ 2020-07-01 23:46         ` Vladislav Shpilevoy
  0 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-01 23:46 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

> Thanks for the patch, LGTM.
> Please, create corresponding doc tracker as we agreed to do it
> independently.

I will do that after the patchset is pushed. That is the only reason
why it is not a part of the commit. Because something still can change.
So I don't think it should be created before push.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 02/19] replication: introduce replication_synchro_* cfg options
  2020-07-01 16:05     ` Sergey Ostanevich
@ 2020-07-01 23:46       ` Vladislav Shpilevoy
  0 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-01 23:46 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

> On 30 Jun 01:15, Vladislav Shpilevoy wrote:
>> Synchronous transactions are supposed to be replicated on a
>> specified number of replicas before committed on master. The
>> number of replicas can be specified using
>> replication_synchro_quorum option. It is 1 by default, so sync
>> transactions work like asynchronous when not configured anyhow.
>> 1 means successful WAL write on master is enough for commit.
>>
>> When replication_synchro_quorum is greater than 1, an instance has to
>> wait for the specified number of replicas to  reply with success. If
> double space here - - - - - - - -- - - ----- -^

Fixed.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 03/19] txn: add TXN_WAIT_ACK flag
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 03/19] txn: add TXN_WAIT_ACK flag Vladislav Shpilevoy
  2020-07-01 17:14     ` Sergey Ostanevich
@ 2020-07-01 23:46     ` Vladislav Shpilevoy
  2020-07-02  8:30     ` Serge Petrenko
  2 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-01 23:46 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

The commit is squashed into the next one, among with other
new flags.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 19/19] replication: block async transactions when not empty limbo
  2020-07-01 17:12     ` Sergey Ostanevich
@ 2020-07-01 23:47       ` Vladislav Shpilevoy
  0 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-01 23:47 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

Hi! Thanks for the review!

> I would like to see it merged with 03 and 04 of this series. It's no
> good to have a self-rewrite in the same patchset, isn't it?

It is not good, when it is done in a weird way which could be easily
avoided. In this patchset this particular commit can't be just merged
into patches 3 and 4. It affects more patches.

Nonetheless I managed to split it in parts. The code changes were
merged into patches 3, 4, 7, 8, 10. The original commit is kept where
it is, and now contains only the test. The test depends on ROLLBACK
and CONFIRM existence, therefore can't be merged into patches 3 and 4
(where most of the limbo appears).

> Plus some nits. 
> 
> Sergos
> 
> On 30 Jun 01:15, Vladislav Shpilevoy wrote:
>> When there is a not committed synchronous transaction, any attempt
>              are transactions it the limbo

What is wrong? If there is a not committed sync transaction, it means
it is in the limbo.

>> to commit a next transaction should be suspended, even if it is an
>> async transaction.
>>
>> This restriction comes from the theoretically possible dependency
>> of what is written in the async transactions on what was written
>> in the previous sync transactions.
>>
>> For that there is a new txn flag - TXN_WAIT_SYNC. Previously the
>> only synchro replication flag was TXN_WAIT_ACK. And now a
>> transaction can be sync, but not wait for ACKs.
> 
> This one took some time for me to grasp - you say an async transaction
> can be sync, which put me to a stop. I believe if join this with ACK
> description it will become clear.

This part of the text is removed. As well as the patch 3 was squashed
into patch 4.

>> In particular, if a transaction:
>>
>> - Is synchronous, the it has TXN_WAIT_SYNC (it is sync), and
>                     ^^^^^^
>>   TXN_WAIT_ACK (need to collect ACKs, or get a CONFIRM);
>>
>> - Is asynchronous, and the limbo was empty and the moment of
>>   commit, the it does not have any of these flags and committed
>>   like earlier;
>>
>> - Is asynchronous, and the limbo was not empty and the moment of
>>   commit. Then it will have only TXN_WAIT_SYNC. So it will be
>>   finished right after all the previous sync transactions are
>>   done. Note: *without waiting for ACKs* - the transaction is
>>   still asynchronous in a sense that it is don't need to wait for
>>   quorum replication.
> 
> So, there should be no TXN_WAIT_ACK set without the TXN_WAIT_SYNC. Is
> there an assertion for this?

Yes, there is plenty of assertions on that.

>> Follow-up #4845
>> ---
>> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
>> index fbe4dcecf..bfb404e8e 100644
>> --- a/src/box/txn_limbo.c
>> +++ b/src/box/txn_limbo.c
>> @@ -238,11 +247,21 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
>>  	assert(limbo->instance_id != REPLICA_ID_NIL);
>>  	struct txn_limbo_entry *e, *tmp;
>>  	rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
>> -		if (e->lsn > lsn)
>> +		/*
>> +		 * Confirm a transaction if
>> +		 * - it is a sync transaction covered by the
>> +		 *   confirmation LSN;
>> +		 * - it is an async transaction, and it is the
>> +		 *   last in the queue. So it does not depend on
>> +		 *   a not finished sync transaction anymore and
>                        ------------  waiting for ACK^------- 

Yes, this is the same.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 15/19] applier: send heartbeat not only on commit, but on any write
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 15/19] applier: send heartbeat not only on commit, but on any write Vladislav Shpilevoy
@ 2020-07-01 23:55     ` Vladislav Shpilevoy
  2020-07-03 12:23     ` Serge Petrenko
  1 sibling, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-01 23:55 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Perhaps works not always as intended:
https://github.com/tarantool/tarantool/issues/5127

The issue exists for async replication too, but is not
that sensitive there.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 01/19] replication: introduce space.is_sync option
  2020-06-30 23:00     ` Vladislav Shpilevoy
  2020-07-01 15:55       ` Sergey Ostanevich
@ 2020-07-02  8:25       ` Serge Petrenko
  1 sibling, 0 replies; 68+ messages in thread
From: Serge Petrenko @ 2020-07-02  8:25 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches


01.07.2020 02:00, Vladislav Shpilevoy пишет:
> On 30/06/2020 01:15, Vladislav Shpilevoy wrote:
>> Synchronous space makes every transaction, affecting its data,
>> wait until it is replicated on a quorum of replicas before it is
>> committed.
>>
>> Part of #4844
>> Part of #5073
>> ---
>>   src/box/alter.cc                              |  5 ++
>>   src/box/lua/net_box.lua                       |  2 +
>>   src/box/lua/schema.lua                        |  3 +
>>   src/box/lua/space.cc                          |  5 ++
>>   src/box/space_def.c                           |  2 +
>>   src/box/space_def.h                           |  6 ++
>>   .../sync_replication_sanity.result            | 71 +++++++++++++++++++
>>   .../sync_replication_sanity.test.lua          | 29 ++++++++
> Renamed to qsync_basic.test.lua. The old name was too long, and wasn't
> about just sanity checks, which are usually extremely simple.

LGTM, thanks!

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 02/19] replication: introduce replication_synchro_* cfg options
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 02/19] replication: introduce replication_synchro_* cfg options Vladislav Shpilevoy
  2020-07-01 16:05     ` Sergey Ostanevich
@ 2020-07-02  8:29     ` Serge Petrenko
  2020-07-02 23:36       ` Vladislav Shpilevoy
  1 sibling, 1 reply; 68+ messages in thread
From: Serge Petrenko @ 2020-07-02  8:29 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches


30.06.2020 02:15, Vladislav Shpilevoy пишет:
> Synchronous transactions are supposed to be replicated on a
> specified number of replicas before committed on master. The
> number of replicas can be specified using
> replication_synchro_quorum option. It is 1 by default, so sync
> transactions work like asynchronous when not configured anyhow.
> 1 means successful WAL write on master is enough for commit.
>
> When replication_synchro_quorum is greater than 1, an instance has to
> wait for the specified number of replicas to  reply with success. If
> enough replies aren't collected during replication_synchro_timeout,
> the instance rolls back the tx in question.
>
> Part of #4844
> Part of #5073

Thanks for the patch!

LGTM with 1 comment below.

> ---
>   src/box/box.cc                  | 53 +++++++++++++++++++++++++++++++++
>   src/box/box.h                   |  2 ++
>   src/box/lua/cfg.cc              | 18 +++++++++++
>   src/box/lua/load_cfg.lua        | 10 +++++++
>   src/box/replication.cc          |  2 ++
>   src/box/replication.h           | 12 ++++++++
>   test/app-tap/init_script.result |  2 ++
>   test/box/admin.result           |  4 +++
>   test/box/cfg.result             |  8 +++++
>   9 files changed, 111 insertions(+)
>
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 871b0d976..0821ea0a3 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -476,6 +476,31 @@ box_check_replication_sync_lag(void)
>   	return lag;
>   }
>   
> +static int
> +box_check_replication_synchro_quorum(void)
> +{
> +	int quorum = cfg_geti("replication_synchro_quorum");
> +	if (quorum <= 0 || quorum > VCLOCK_MAX) {

It should be `quorum >= VCLOCK_MAX`, because you can't have VCLOCK_MAX (32)

instances in a cluster, only 31. Id 0 is used by anonymous replicas.

> +		diag_set(ClientError, ER_CFG, "replication_synchro_quorum",
> +			 "the value must be greater than zero and less than "
> +			 "maximal number of replicas");
> +		return -1;
> +	}
> +	return quorum;
> +}
> +
> +static double
> +box_check_replication_synchro_timeout(void)
> +{
> +	double timeout = cfg_getd("replication_synchro_timeout");
> +	if (timeout <= 0) {
> +		diag_set(ClientError, ER_CFG, "replication_synchro_timeout",
> +			 "the value must be greater than zero");
> +		return -1;
> +	}
> +	return timeout;
> +}
> +
>   static double
>   box_check_replication_sync_timeout(void)
>   {
> @@ -658,6 +683,10 @@ box_check_config()
>   	box_check_replication_connect_timeout();
>   	box_check_replication_connect_quorum();
>   	box_check_replication_sync_lag();
> +	if (box_check_replication_synchro_quorum() < 0)
> +		diag_raise();
> +	if (box_check_replication_synchro_timeout() < 0)
> +		diag_raise();
>   	box_check_replication_sync_timeout();
>   	box_check_readahead(cfg_geti("readahead"));
>   	box_check_checkpoint_count(cfg_geti("checkpoint_count"));
> @@ -777,6 +806,26 @@ box_set_replication_sync_lag(void)
>   	replication_sync_lag = box_check_replication_sync_lag();
>   }
>   
> +int
> +box_set_replication_synchro_quorum(void)
> +{
> +	int value = box_check_replication_synchro_quorum();
> +	if (value < 0)
> +		return -1;
> +	replication_synchro_quorum = value;
> +	return 0;
> +}
> +
> +int
> +box_set_replication_synchro_timeout(void)
> +{
> +	double value = box_check_replication_synchro_timeout();
> +	if (value < 0)
> +		return -1;
> +	replication_synchro_timeout = value;
> +	return 0;
> +}
> +
>   void
>   box_set_replication_sync_timeout(void)
>   {
> @@ -2417,6 +2466,10 @@ box_cfg_xc(void)
>   	box_set_replication_connect_timeout();
>   	box_set_replication_connect_quorum();
>   	box_set_replication_sync_lag();
> +	if (box_set_replication_synchro_quorum() != 0)
> +		diag_raise();
> +	if (box_set_replication_synchro_timeout() != 0)
> +		diag_raise();
>   	box_set_replication_sync_timeout();
>   	box_set_replication_skip_conflict();
>   	box_set_replication_anon();
> diff --git a/src/box/box.h b/src/box/box.h
> index 557542a83..f9789154e 100644
> --- a/src/box/box.h
> +++ b/src/box/box.h
> @@ -243,6 +243,8 @@ void box_set_replication_timeout(void);
>   void box_set_replication_connect_timeout(void);
>   void box_set_replication_connect_quorum(void);
>   void box_set_replication_sync_lag(void);
> +int box_set_replication_synchro_quorum(void);
> +int box_set_replication_synchro_timeout(void);
>   void box_set_replication_sync_timeout(void);
>   void box_set_replication_skip_conflict(void);
>   void box_set_replication_anon(void);
> diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc
> index a5b15e527..d481155cd 100644
> --- a/src/box/lua/cfg.cc
> +++ b/src/box/lua/cfg.cc
> @@ -313,6 +313,22 @@ lbox_cfg_set_replication_sync_lag(struct lua_State *L)
>   	return 0;
>   }
>   
> +static int
> +lbox_cfg_set_replication_synchro_quorum(struct lua_State *L)
> +{
> +	if (box_set_replication_synchro_quorum() != 0)
> +		luaT_error(L);
> +	return 0;
> +}
> +
> +static int
> +lbox_cfg_set_replication_synchro_timeout(struct lua_State *L)
> +{
> +	if (box_set_replication_synchro_timeout() != 0)
> +		luaT_error(L);
> +	return 0;
> +}
> +
>   static int
>   lbox_cfg_set_replication_sync_timeout(struct lua_State *L)
>   {
> @@ -370,6 +386,8 @@ box_lua_cfg_init(struct lua_State *L)
>   		{"cfg_set_replication_connect_quorum", lbox_cfg_set_replication_connect_quorum},
>   		{"cfg_set_replication_connect_timeout", lbox_cfg_set_replication_connect_timeout},
>   		{"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag},
> +		{"cfg_set_replication_synchro_quorum", lbox_cfg_set_replication_synchro_quorum},
> +		{"cfg_set_replication_synchro_timeout", lbox_cfg_set_replication_synchro_timeout},
>   		{"cfg_set_replication_sync_timeout", lbox_cfg_set_replication_sync_timeout},
>   		{"cfg_set_replication_skip_conflict", lbox_cfg_set_replication_skip_conflict},
>   		{"cfg_set_replication_anon", lbox_cfg_set_replication_anon},
> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
> index f2f2df6f8..a7f03c7d6 100644
> --- a/src/box/lua/load_cfg.lua
> +++ b/src/box/lua/load_cfg.lua
> @@ -89,6 +89,8 @@ local default_cfg = {
>       replication_timeout = 1,
>       replication_sync_lag = 10,
>       replication_sync_timeout = 300,
> +    replication_synchro_quorum = 1,
> +    replication_synchro_timeout = 5,
>       replication_connect_timeout = 30,
>       replication_connect_quorum = nil, -- connect all
>       replication_skip_conflict = false,
> @@ -164,6 +166,8 @@ local template_cfg = {
>       replication_timeout = 'number',
>       replication_sync_lag = 'number',
>       replication_sync_timeout = 'number',
> +    replication_synchro_quorum = 'number',
> +    replication_synchro_timeout = 'number',
>       replication_connect_timeout = 'number',
>       replication_connect_quorum = 'number',
>       replication_skip_conflict = 'boolean',
> @@ -280,6 +284,8 @@ local dynamic_cfg = {
>       replication_connect_quorum = private.cfg_set_replication_connect_quorum,
>       replication_sync_lag    = private.cfg_set_replication_sync_lag,
>       replication_sync_timeout = private.cfg_set_replication_sync_timeout,
> +    replication_synchro_quorum = private.cfg_set_replication_synchro_quorum,
> +    replication_synchro_timeout = private.cfg_set_replication_synchro_timeout,
>       replication_skip_conflict = private.cfg_set_replication_skip_conflict,
>       replication_anon        = private.cfg_set_replication_anon,
>       instance_uuid           = check_instance_uuid,
> @@ -313,6 +319,8 @@ local dynamic_cfg_order = {
>       replication_timeout     = 150,
>       replication_sync_lag    = 150,
>       replication_sync_timeout    = 150,
> +    replication_synchro_quorum  = 150,
> +    replication_synchro_timeout = 150,
>       replication_connect_timeout = 150,
>       replication_connect_quorum  = 150,
>       replication             = 200,
> @@ -348,6 +356,8 @@ local dynamic_cfg_skip_at_load = {
>       replication_connect_quorum = true,
>       replication_sync_lag    = true,
>       replication_sync_timeout = true,
> +    replication_synchro_quorum = true,
> +    replication_synchro_timeout = true,
>       replication_skip_conflict = true,
>       replication_anon        = true,
>       wal_dir_rescan_delay    = true,
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index 273a7cb66..01e9e876a 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -51,6 +51,8 @@ double replication_timeout = 1.0; /* seconds */
>   double replication_connect_timeout = 30.0; /* seconds */
>   int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL;
>   double replication_sync_lag = 10.0; /* seconds */
> +int replication_synchro_quorum = 1;
> +double replication_synchro_timeout = 5.0; /* seconds */
>   double replication_sync_timeout = 300.0; /* seconds */
>   bool replication_skip_conflict = false;
>   bool replication_anon = false;
> diff --git a/src/box/replication.h b/src/box/replication.h
> index 93a25c8a7..a081870f9 100644
> --- a/src/box/replication.h
> +++ b/src/box/replication.h
> @@ -125,6 +125,18 @@ extern int replication_connect_quorum;
>    */
>   extern double replication_sync_lag;
>   
> +/**
> + * Minimal number of replicas which should ACK a synchronous
> + * transaction to be able to confirm it and commit.
> + */
> +extern int replication_synchro_quorum;
> +
> +/**
> + * Time in seconds which the master node is able to wait for ACKs
> + * for a synchronous transaction until it is rolled back.
> + */
> +extern double replication_synchro_timeout;
> +
>   /**
>    * Max time to wait for appliers to synchronize before entering
>    * the orphan mode.
> diff --git a/test/app-tap/init_script.result b/test/app-tap/init_script.result
> index 7c4454285..857f0c95f 100644
> --- a/test/app-tap/init_script.result
> +++ b/test/app-tap/init_script.result
> @@ -30,6 +30,8 @@ replication_connect_timeout:30
>   replication_skip_conflict:false
>   replication_sync_lag:10
>   replication_sync_timeout:300
> +replication_synchro_quorum:1
> +replication_synchro_timeout:5
>   replication_timeout:1
>   slab_alloc_factor:1.05
>   sql_cache_size:5242880
> diff --git a/test/box/admin.result b/test/box/admin.result
> index d94da8c5d..ab3e80a97 100644
> --- a/test/box/admin.result
> +++ b/test/box/admin.result
> @@ -81,6 +81,10 @@ cfg_filter(box.cfg)
>       - 10
>     - - replication_sync_timeout
>       - 300
> +  - - replication_synchro_quorum
> +    - 1
> +  - - replication_synchro_timeout
> +    - 5
>     - - replication_timeout
>       - 1
>     - - slab_alloc_factor
> diff --git a/test/box/cfg.result b/test/box/cfg.result
> index b41d54599..bdd210b09 100644
> --- a/test/box/cfg.result
> +++ b/test/box/cfg.result
> @@ -69,6 +69,10 @@ cfg_filter(box.cfg)
>    |     - 10
>    |   - - replication_sync_timeout
>    |     - 300
> + |   - - replication_synchro_quorum
> + |     - 1
> + |   - - replication_synchro_timeout
> + |     - 5
>    |   - - replication_timeout
>    |     - 1
>    |   - - slab_alloc_factor
> @@ -172,6 +176,10 @@ cfg_filter(box.cfg)
>    |     - 10
>    |   - - replication_sync_timeout
>    |     - 300
> + |   - - replication_synchro_quorum
> + |     - 1
> + |   - - replication_synchro_timeout
> + |     - 5
>    |   - - replication_timeout
>    |     - 1
>    |   - - slab_alloc_factor

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 03/19] txn: add TXN_WAIT_ACK flag
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 03/19] txn: add TXN_WAIT_ACK flag Vladislav Shpilevoy
  2020-07-01 17:14     ` Sergey Ostanevich
  2020-07-01 23:46     ` Vladislav Shpilevoy
@ 2020-07-02  8:30     ` Serge Petrenko
  2 siblings, 0 replies; 68+ messages in thread
From: Serge Petrenko @ 2020-07-02  8:30 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches


30.06.2020 02:15, Vladislav Shpilevoy пишет:
> When a transaction touches a synchronous space, its commit
> procedure changes. The transaction enters state of 'waiting for
> acks' from replicas and from own master's WAL.
>
> To denote the state there is a new transaction flag -
> TXN_WAIT_ACK.
>
> Part of #4844
> ---
>   src/box/txn.c | 6 ++++++
>   src/box/txn.h | 7 +++++++
>   2 files changed, 13 insertions(+)
>
> diff --git a/src/box/txn.c b/src/box/txn.c
> index 123520166..edc1f5180 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -495,6 +495,7 @@ txn_journal_entry_new(struct txn *txn)
>   
>   	struct xrow_header **remote_row = req->rows;
>   	struct xrow_header **local_row = req->rows + txn->n_applier_rows;
> +	bool is_sync = false;
>   
>   	stailq_foreach_entry(stmt, &txn->stmts, next) {
>   		if (stmt->has_triggers) {
> @@ -506,6 +507,9 @@ txn_journal_entry_new(struct txn *txn)
>   		if (stmt->row == NULL)
>   			continue;
>   
> +		is_sync = is_sync || (stmt->space != NULL &&
> +				      stmt->space->def->opts.is_sync);
> +
>   		if (stmt->row->replica_id == 0)
>   			*local_row++ = stmt->row;
>   		else
> @@ -513,6 +517,8 @@ txn_journal_entry_new(struct txn *txn)
>   
>   		req->approx_len += xrow_approx_len(stmt->row);
>   	}
> +	if (is_sync)
> +		txn_set_flag(txn, TXN_WAIT_ACK);
>   
>   	assert(remote_row == req->rows + txn->n_applier_rows);
>   	assert(local_row == remote_row + txn->n_new_rows);
> diff --git a/src/box/txn.h b/src/box/txn.h
> index 3f6d79d5c..232cc07a8 100644
> --- a/src/box/txn.h
> +++ b/src/box/txn.h
> @@ -66,6 +66,13 @@ enum txn_flag {
>   	TXN_CAN_YIELD,
>   	/** on_commit and/or on_rollback list is not empty. */
>   	TXN_HAS_TRIGGERS,
> +	/**
> +	 * Transaction, touched sync spaces, enters 'waiting for
> +	 * acks' state before commit. In this state it waits until
> +	 * it is replicated onto a quorum of replicas, and only
> +	 * then finishes commit and returns success to a user.
> +	 */
> +	TXN_WAIT_ACK,
>   };
>   
>   enum {
Thanks for the  patch! LGTM.

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 04/19] replication: make sync transactions wait quorum
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 04/19] replication: make sync transactions wait quorum Vladislav Shpilevoy
  2020-06-30 23:00     ` Vladislav Shpilevoy
@ 2020-07-02  8:48     ` Serge Petrenko
  2020-07-03 21:16       ` Vladislav Shpilevoy
  2020-07-05 16:05     ` Vladislav Shpilevoy
  2 siblings, 1 reply; 68+ messages in thread
From: Serge Petrenko @ 2020-07-02  8:48 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches


30.06.2020 02:15, Vladislav Shpilevoy пишет:
> Synchronous transaction (which changes anything in a synchronous
> space) before commit waits until it is replicated onto a quorum
> of replicas.
>
> So far all the 'synchronousness' is basically the same as the well
> known 'wait_lsn' technique. With the exception, that the
> transaction really is not committed until replicated.
>
> Problem of wait_lsn is still present though, in case master
> restarts. Because there is no a 'confirm' record in WAL telling
> which transactions are replicated and can be applied.
>
> Closes #4844
> Closes #4845

Looks good, thanks!

I have a concern though, not related to the patch itself. Please see below.

> ---
>   src/box/CMakeLists.txt |   1 +
>   src/box/box.cc         |   2 +
>   src/box/errcode.h      |   1 +
>   src/box/relay.cc       |  11 +++
>   src/box/txn.c          |  51 +++++++++++-
>   src/box/txn_limbo.c    | 176 +++++++++++++++++++++++++++++++++++++++++
>   src/box/txn_limbo.h    | 168 +++++++++++++++++++++++++++++++++++++++
>   test/box/error.result  |   1 +
>   8 files changed, 409 insertions(+), 2 deletions(-)
>   create mode 100644 src/box/txn_limbo.c
>   create mode 100644 src/box/txn_limbo.h
>
> diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
> index 63f98f6c8..b8b2689d2 100644
> --- a/src/box/CMakeLists.txt
> +++ b/src/box/CMakeLists.txt
> @@ -169,6 +169,7 @@ add_library(box STATIC
>       session.cc
>       port.c
>       txn.c
> +    txn_limbo.c
>       box.cc
>       gc.c
>       checkpoint_schedule.c
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 0821ea0a3..02088ba01 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -59,6 +59,7 @@
>   #include "index.h"
>   #include "port.h"
>   #include "txn.h"
> +#include "txn_limbo.h"
>   #include "user.h"
>   #include "cfg.h"
>   #include "coio.h"
> @@ -2413,6 +2414,7 @@ box_init(void)
>   	if (tuple_init(lua_hash) != 0)
>   		diag_raise();
>   
> +	txn_limbo_init();
>   	sequence_init();
>   }
>   
> diff --git a/src/box/errcode.h b/src/box/errcode.h
> index d1e4d02a9..019c582af 100644
> --- a/src/box/errcode.h
> +++ b/src/box/errcode.h
> @@ -266,6 +266,7 @@ struct errcode_record {
>   	/*211 */_(ER_WRONG_QUERY_ID,		"Prepared statement with id %u does not exist") \
>   	/*212 */_(ER_SEQUENCE_NOT_STARTED,		"Sequence '%s' is not started") \
>   	/*213 */_(ER_NO_SUCH_SESSION_SETTING,	"Session setting %s doesn't exist") \
> +	/*214 */_(ER_UNCOMMITTED_FOREIGN_SYNC_TXNS, "Found uncommitted sync transactions from other instance with id %u") \
>   
>   /*
>    * !IMPORTANT! Please follow instructions at start of the file
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index 2ad02cb8a..36fc14b8c 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -53,6 +53,7 @@
>   #include "xrow_io.h"
>   #include "xstream.h"
>   #include "wal.h"
> +#include "txn_limbo.h"
>   
>   /**
>    * Cbus message to send status updates from relay to tx thread.
> @@ -399,6 +400,16 @@ tx_status_update(struct cmsg *msg)
>   {
>   	struct relay_status_msg *status = (struct relay_status_msg *)msg;
>   	vclock_copy(&status->relay->tx.vclock, &status->vclock);
> +	/*
> +	 * Let pending synchronous transactions know, which of
> +	 * them were successfully sent to the replica. Acks are
> +	 * collected only by the transactions originator (which is
> +	 * the single master in 100% so far).
> +	 */
> +	if (txn_limbo.instance_id == instance_id) {
> +		txn_limbo_ack(&txn_limbo, status->relay->replica->id,
> +			      vclock_get(&status->vclock, instance_id));
> +	}
>   	static const struct cmsg_hop route[] = {
>   		{relay_status_update, NULL}
>   	};
> diff --git a/src/box/txn.c b/src/box/txn.c
> index edc1f5180..6cfa98212 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -29,6 +29,7 @@
>    * SUCH DAMAGE.
>    */
>   #include "txn.h"
> +#include "txn_limbo.h"
>   #include "engine.h"
>   #include "tuple.h"
>   #include "journal.h"
> @@ -433,7 +434,7 @@ txn_complete(struct txn *txn)
>   			engine_rollback(txn->engine, txn);
>   		if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
>   			txn_run_rollback_triggers(txn, &txn->on_rollback);
> -	} else {
> +	} else if (!txn_has_flag(txn, TXN_WAIT_ACK)) {
>   		/* Commit the transaction. */
>   		if (txn->engine != NULL)
>   			engine_commit(txn->engine, txn);
> @@ -448,6 +449,19 @@ txn_complete(struct txn *txn)
>   					     txn->signature - n_rows + 1,
>   					     stop_tm - txn->start_tm);
>   		}
> +	} else {
> +		/*
> +		 * Complete is called on every WAL operation
> +		 * authored by this transaction. And it not always
> +		 * is one. And not always is enough for commit.
> +		 * In case the transaction is waiting for acks, it
> +		 * can't be committed right away. Give control
> +		 * back to the fiber, owning the transaction so as
> +		 * it could decide what to do next.
> +		 */
> +		if (txn->fiber != NULL && txn->fiber != fiber())
> +			fiber_wakeup(txn->fiber);
> +		return;
>   	}
>   	/*
>   	 * If there is no fiber waiting for the transaction then
> @@ -517,6 +531,11 @@ txn_journal_entry_new(struct txn *txn)
>   
>   		req->approx_len += xrow_approx_len(stmt->row);
>   	}
> +	/*
> +	 * There is no a check for all-local rows, because a local
> +	 * space can't be synchronous. So if there is at least one
> +	 * synchronous space, the transaction is not local.
> +	 */
>   	if (is_sync)
>   		txn_set_flag(txn, TXN_WAIT_ACK);
>   
> @@ -627,6 +646,7 @@ int
>   txn_commit(struct txn *txn)
>   {
>   	struct journal_entry *req;
> +	struct txn_limbo_entry *limbo_entry;
>   
>   	txn->fiber = fiber();
>   
> @@ -648,8 +668,31 @@ txn_commit(struct txn *txn)
>   		return -1;
>   	}
>   
> +	bool is_sync = txn_has_flag(txn, TXN_WAIT_ACK);
> +	if (is_sync) {
> +		/*
> +		 * Remote rows, if any, come before local rows, so
> +		 * check for originating instance id here.
> +		 */
> +		uint32_t origin_id = req->rows[0]->replica_id;
> +
> +		/*
> +		 * Append now. Before even WAL write is done.
> +		 * After WAL write nothing should fail, even OOM
> +		 * wouldn't be acceptable.
> +		 */
> +		limbo_entry = txn_limbo_append(&txn_limbo, origin_id, txn);
> +		if (limbo_entry == NULL) {
> +			txn_rollback(txn);
> +			txn_free(txn);
> +			return -1;
> +		}
> +	}
> +
>   	fiber_set_txn(fiber(), NULL);
>   	if (journal_write(req) != 0) {
> +		if (is_sync)
> +			txn_limbo_abort(&txn_limbo, limbo_entry);
>   		fiber_set_txn(fiber(), txn);
>   		txn_rollback(txn);
>   		txn_free(txn);
> @@ -658,7 +701,11 @@ txn_commit(struct txn *txn)
>   		diag_log();
>   		return -1;
>   	}
> -
> +	if (is_sync) {
> +		txn_limbo_assign_lsn(&txn_limbo, limbo_entry,
> +				     req->rows[req->n_rows - 1]->lsn);

This assumes that the last tx row is a global one. This'll be true once
#4928 [1] is fixed. However, the fix is a crutch, either appending a 
dummy NOP
statement at the end of tx, or reordering the rows so that the last one is
global. If we remove the crutch someday, we'll also break LSN assignment 
here.
Maybe there's a better way to assign LSN to a synchronous tx?

Another point. Once async transactions are also added to limbo, we'll
have fully local transactions in limbo. Local transactions have a separate
LSN counter, so we probably have to assign them the same LSN as the last
synchronous transaction waiting in limbo. Looks like this'll work.

[1] https://github.com/tarantool/tarantool/issues/4928

> +		txn_limbo_wait_complete(&txn_limbo, limbo_entry);
> +	}
>   	if (!txn_has_flag(txn, TXN_IS_DONE)) {
>   		txn->signature = req->res;
>   		txn_complete(txn);
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> new file mode 100644
> index 000000000..9de91db93
> --- /dev/null
> +++ b/src/box/txn_limbo.c
> @@ -0,0 +1,176 @@
> +/*
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + *    copyright notice, this list of conditions and the
> + *    following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + *    copyright notice, this list of conditions and the following
> + *    disclaimer in the documentation and/or other materials
> + *    provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include "txn.h"
> +#include "txn_limbo.h"
> +#include "replication.h"
> +
> +struct txn_limbo txn_limbo;
> +
> +static inline void
> +txn_limbo_create(struct txn_limbo *limbo)
> +{
> +	rlist_create(&limbo->queue);
> +	limbo->instance_id = REPLICA_ID_NIL;
> +	vclock_create(&limbo->vclock);
> +}
> +
> +struct txn_limbo_entry *
> +txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
> +{
> +	assert(txn_has_flag(txn, TXN_WAIT_ACK));
> +	if (id == 0)
> +		id = instance_id;
> +	if (limbo->instance_id != id) {
> +		if (limbo->instance_id == REPLICA_ID_NIL ||
> +		    rlist_empty(&limbo->queue)) {
> +			limbo->instance_id = id;
> +		} else {
> +			diag_set(ClientError, ER_UNCOMMITTED_FOREIGN_SYNC_TXNS,
> +				 limbo->instance_id);
> +			return NULL;
> +		}
> +	}
> +	size_t size;
> +	struct txn_limbo_entry *e = region_alloc_object(&txn->region,
> +							typeof(*e), &size);
> +	if (e == NULL) {
> +		diag_set(OutOfMemory, size, "region_alloc_object", "e");
> +		return NULL;
> +	}
> +	e->txn = txn;
> +	e->lsn = -1;
> +	e->ack_count = 0;
> +	e->is_commit = false;
> +	e->is_rollback = false;
> +	rlist_add_tail_entry(&limbo->queue, e, in_queue);
> +	return e;
> +}
> +
> +static inline void
> +txn_limbo_remove(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
> +{
> +	assert(!rlist_empty(&entry->in_queue));
> +	assert(rlist_first_entry(&limbo->queue, struct txn_limbo_entry,
> +				 in_queue) == entry);
> +	(void) limbo;
> +	rlist_del_entry(entry, in_queue);
> +}
> +
> +void
> +txn_limbo_abort(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
> +{
> +	entry->is_rollback = true;
> +	txn_limbo_remove(limbo, entry);
> +}
> +
> +void
> +txn_limbo_assign_lsn(struct txn_limbo *limbo, struct txn_limbo_entry *entry,
> +		     int64_t lsn)
> +{
> +	assert(limbo->instance_id != REPLICA_ID_NIL);
> +	entry->lsn = lsn;
> +	++entry->ack_count;
> +	vclock_follow(&limbo->vclock, limbo->instance_id, lsn);
> +}
> +
> +static bool
> +txn_limbo_check_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
> +{
> +	if (txn_limbo_entry_is_complete(entry))
> +		return true;
> +	struct vclock_iterator iter;
> +	vclock_iterator_init(&iter, &limbo->vclock);
> +	int ack_count = 0;
> +	int64_t lsn = entry->lsn;
> +	vclock_foreach(&iter, vc)
> +		ack_count += vc.lsn >= lsn;
> +	assert(ack_count >= entry->ack_count);
> +	entry->ack_count = ack_count;
> +	entry->is_commit = ack_count >= replication_synchro_quorum;
> +	return entry->is_commit;
> +}
> +
> +void
> +txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
> +{
> +	struct txn *txn = entry->txn;
> +	assert(entry->lsn > 0);
> +	assert(!txn_has_flag(txn, TXN_IS_DONE));
> +	assert(txn_has_flag(txn, TXN_WAIT_ACK));
> +	if (txn_limbo_check_complete(limbo, entry)) {
> +		txn_limbo_remove(limbo, entry);
> +		return;
> +	}
> +	bool cancellable = fiber_set_cancellable(false);
> +	while (!txn_limbo_entry_is_complete(entry))
> +		fiber_yield();
> +	fiber_set_cancellable(cancellable);
> +	// TODO: implement rollback.
> +	// TODO: implement confirm.
> +	assert(!entry->is_rollback);
> +	txn_limbo_remove(limbo, entry);
> +	txn_clear_flag(txn, TXN_WAIT_ACK);
> +}
> +
> +void
> +txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
> +{
> +	if (rlist_empty(&limbo->queue))
> +		return;
> +	assert(limbo->instance_id != REPLICA_ID_NIL);
> +	int64_t prev_lsn = vclock_get(&limbo->vclock, replica_id);
> +	vclock_follow(&limbo->vclock, replica_id, lsn);
> +	struct txn_limbo_entry *e, *tmp;
> +	rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
> +		if (e->lsn <= prev_lsn)
> +			continue;
> +		if (e->lsn > lsn)
> +			break;
> +		if (++e->ack_count >= replication_synchro_quorum) {
> +			// TODO: better call complete() right
> +			// here. Appliers use async transactions,
> +			// and their txns don't have fibers to
> +			// wake up. That becomes actual, when
> +			// appliers will be supposed to wait for
> +			// 'confirm' message.
> +			e->is_commit = true;
> +			rlist_del_entry(e, in_queue);
> +			fiber_wakeup(e->txn->fiber);
> +		}
> +		assert(e->ack_count <= VCLOCK_MAX);
> +	}
> +}
> +
> +void
> +txn_limbo_init(void)
> +{
> +	txn_limbo_create(&txn_limbo);
> +}
> diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
> new file mode 100644
> index 000000000..1ad1c567a
> --- /dev/null
> +++ b/src/box/txn_limbo.h
> @@ -0,0 +1,168 @@
> +#pragma once
> +/*
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + *    copyright notice, this list of conditions and the
> + *    following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + *    copyright notice, this list of conditions and the following
> + *    disclaimer in the documentation and/or other materials
> + *    provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include "small/rlist.h"
> +#include "vclock.h"
> +
> +#include <stdint.h>
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif /* defined(__cplusplus) */
> +
> +struct txn;
> +
> +/**
> + * Transaction and its quorum metadata, to be stored in limbo.
> + */
> +struct txn_limbo_entry {
> +	/** Link for limbo's queue. */
> +	struct rlist in_queue;
> +	/** Transaction, waiting for a quorum. */
> +	struct txn *txn;
> +	/**
> +	 * LSN of the transaction by the originator's vclock
> +	 * component. May be -1 in case the transaction is not
> +	 * written to WAL yet.
> +	 */
> +	int64_t lsn;
> +	/**
> +	 * Number of ACKs. Or in other words - how many replicas
> +	 * confirmed receipt of the transaction.
> +	 */
> +	int ack_count;
> +	/**
> +	 * Result flags. Only one of them can be true. But both
> +	 * can be false if the transaction is still waiting for
> +	 * its resolution.
> +	 */
> +	bool is_commit;
> +	bool is_rollback;
> +};
> +
> +static inline bool
> +txn_limbo_entry_is_complete(const struct txn_limbo_entry *e)
> +{
> +	return e->is_commit || e->is_rollback;
> +}
> +
> +/**
> + * Limbo is a place where transactions are stored, which are
> + * finished, but not committed nor rolled back. These are
> + * synchronous transactions in progress of collecting ACKs from
> + * replicas.
> + * Limbo's main purposes
> + *   - maintain the transactions ordered by LSN of their emitter;
> + *   - be a link between transaction and replication modules, so
> + *     as they wouldn't depend on each other directly.
> + */
> +struct txn_limbo {
> +	/**
> +	 * Queue of limbo entries. Ordered by LSN. Some of the
> +	 * entries in the end may not have an LSN yet (their local
> +	 * WAL write is still in progress), but their order won't
> +	 * change anyway. Because WAL write completions will give
> +	 * them LSNs in the same order.
> +	 */
> +	struct rlist queue;
> +	/**
> +	 * Instance ID of the owner of all the transactions in the
> +	 * queue. Strictly speaking, nothing prevents to store not
> +	 * own transactions here, originated from some other
> +	 * instance. But still the queue may contain only
> +	 * transactions of the same instance. Otherwise LSN order
> +	 * won't make sense - different nodes have own independent
> +	 * LSNs in their vclock components.
> +	 */
> +	uint32_t instance_id;
> +	/**
> +	 * All components of the vclock are versions of the limbo
> +	 * owner's LSN, how it is visible on other nodes. For
> +	 * example, assume instance ID of the limbo is 1. Then
> +	 * vclock[1] here is local LSN of the instance 1.
> +	 * vclock[2] is how replica with ID 2 sees LSN of
> +	 * instance 1.
> +	 * vclock[3] is how replica with ID 3 sees LSN of
> +	 * instance 1, and so on.
> +	 * In that way by looking at this vclock it is always can
> +	 * be said up to which LSN there is a sync quorum for
> +	 * transactions, created on the limbo's owner node.
> +	 */
> +	struct vclock vclock;
> +};
> +
> +/**
> + * Global limbo entry. So far an instance can have only one limbo,
> + * where master's transactions are stored. Eventually there may
> + * appear more than one limbo for master-master support.
> + */
> +extern struct txn_limbo txn_limbo;
> +
> +/**
> + * Allocate, create, and append a new transaction to the limbo.
> + * The limbo entry is allocated on the transaction's region.
> + */
> +struct txn_limbo_entry *
> +txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn);
> +
> +/** Remove the entry from the limbo, mark as rolled back. */
> +void
> +txn_limbo_abort(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
> +
> +/**
> + * Assign local LSN to the limbo entry. That happens when the
> + * transaction is added to the limbo, writes to WAL, and gets an
> + * LSN.
> + */
> +void
> +txn_limbo_assign_lsn(struct txn_limbo *limbo, struct txn_limbo_entry *entry,
> +		     int64_t lsn);
> +
> +/**
> + * Ack all transactions up to the given LSN on behalf of the
> + * replica with the specified ID.
> + */
> +void
> +txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn);
> +
> +/**
> + * Block the current fiber until the transaction in the limbo
> + * entry is either committed or rolled back.
> + */
> +void
> +txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
> +
> +void
> +txn_limbo_init();
> +
> +#if defined(__cplusplus)
> +}
> +#endif /* defined(__cplusplus) */
> diff --git a/test/box/error.result b/test/box/error.result
> index 2196fa541..69c471085 100644
> --- a/test/box/error.result
> +++ b/test/box/error.result
> @@ -432,6 +432,7 @@ t;
>    |   211: box.error.WRONG_QUERY_ID
>    |   212: box.error.SEQUENCE_NOT_STARTED
>    |   213: box.error.NO_SUCH_SESSION_SETTING
> + |   214: box.error.UNCOMMITTED_FOREIGN_SYNC_TXNS
>    | ...
>   
>   test_run:cmd("setopt delimiter ''");

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 08/19] replication: add support of qsync to the snapshot machinery
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 08/19] replication: add support of qsync to the snapshot machinery Vladislav Shpilevoy
@ 2020-07-02  8:52     ` Serge Petrenko
  2020-07-08 11:43     ` Leonid Vasiliev
  1 sibling, 0 replies; 68+ messages in thread
From: Serge Petrenko @ 2020-07-02  8:52 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches


30.06.2020 02:15, Vladislav Shpilevoy пишет:
> From: Leonid Vasiliev <lvasiliev@tarantool.org>
>
> To support qsync replication, the waiting for confirmation of
> current "sync" transactions during a timeout has been added to
> the snapshot machinery. In the case of rollback or the timeout
> expiration, the snapshot will be cancelled.
>
> Closes #4850
> ---
>   src/box/gc.c                       |  12 ++
>   src/box/txn_limbo.c                |  81 ++++++++++
>   src/box/txn_limbo.h                |  29 ++++
>   test/unit/CMakeLists.txt           |   3 +
>   test/unit/snap_quorum_delay.cc     | 250 +++++++++++++++++++++++++++++
>   test/unit/snap_quorum_delay.result |  12 ++
>   6 files changed, 387 insertions(+)
>   create mode 100644 test/unit/snap_quorum_delay.cc
>   create mode 100644 test/unit/snap_quorum_delay.result
>
> diff --git a/src/box/gc.c b/src/box/gc.c
> index 8e8ffea75..170c0a97f 100644
> --- a/src/box/gc.c
> +++ b/src/box/gc.c
> @@ -57,6 +57,9 @@
>   #include "engine.h"		/* engine_collect_garbage() */
>   #include "wal.h"		/* wal_collect_garbage() */
>   #include "checkpoint_schedule.h"
> +#include "trigger.h"
> +#include "txn.h"
> +#include "txn_limbo.h"
>   
>   struct gc_state gc;
>   
> @@ -395,6 +398,15 @@ gc_do_checkpoint(bool is_scheduled)
>   	rc = wal_begin_checkpoint(&checkpoint);
>   	if (rc != 0)
>   		goto out;
> +
> +	/*
> +	 * Wait the confirms on all "sync" transactions before
> +	 * create a snapshot.
> +	 */
> +	rc = txn_limbo_wait_confirm(&txn_limbo);
> +	if (rc != 0)
> +		goto out;
> +
>   	rc = engine_commit_checkpoint(&checkpoint.vclock);
>   	if (rc != 0)
>   		goto out;
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index b38d82e4f..bee8e8155 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -231,6 +231,87 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
>   	}
>   }
>   
> +double
> +txn_limbo_confirm_timeout(struct txn_limbo *limbo)
> +{
> +	(void)limbo;
> +	return replication_synchro_timeout;
> +}
> +
> +/**
> + * Waitpoint stores information about the progress of confirmation.
> + * In the case of multimaster support, it will store a bitset
> + * or array instead of the boolean.
> + */
> +struct confirm_waitpoint {
> +	/**
> +	 * Variable for wake up the fiber that is waiting for
> +	 * the end of confirmation.
> +	 */
> +	struct fiber_cond confirm_cond;
> +	/**
> +	 * Result flag.
> +	 */
> +	bool is_confirm;
> +};
> +
> +static int
> +txn_commit_cb(struct trigger *trigger, void *event)
> +{
> +	(void)event;
> +	struct confirm_waitpoint *cwp =
> +		(struct confirm_waitpoint *)trigger->data;
> +	cwp->is_confirm = true;
> +	fiber_cond_signal(&cwp->confirm_cond);
> +	return 0;
> +}
> +
> +static int
> +txn_rollback_cb(struct trigger *trigger, void *event)
> +{
> +	(void)event;
> +	struct confirm_waitpoint *cwp =
> +		(struct confirm_waitpoint *)trigger->data;
> +	fiber_cond_signal(&cwp->confirm_cond);
> +	return 0;
> +}
> +
> +int
> +txn_limbo_wait_confirm(struct txn_limbo *limbo)
> +{
> +	if (txn_limbo_is_empty(limbo))
> +		return 0;
> +
> +	/* initialization of a waitpoint. */
> +	struct confirm_waitpoint cwp;
> +	fiber_cond_create(&cwp.confirm_cond);
> +	cwp.is_confirm = false;
> +
> +	/* Set triggers for the last limbo transaction. */
> +	struct trigger on_complete;
> +	trigger_create(&on_complete, txn_commit_cb, &cwp, NULL);
> +	struct trigger on_rollback;
> +	trigger_create(&on_rollback, txn_rollback_cb, &cwp, NULL);
> +	struct txn_limbo_entry *tle = txn_limbo_last_entry(limbo);
> +	txn_on_commit(tle->txn, &on_complete);
> +	txn_on_rollback(tle->txn, &on_rollback);
> +
> +	int rc = fiber_cond_wait_timeout(&cwp.confirm_cond,
> +					 txn_limbo_confirm_timeout(limbo));
> +	fiber_cond_destroy(&cwp.confirm_cond);
> +	if (rc != 0) {
> +		/* Clear the triggers if the timeout has been reached. */
> +		trigger_clear(&on_complete);
> +		trigger_clear(&on_rollback);
> +		return -1;
> +	}
> +	if (!cwp.is_confirm) {
> +		/* The transaction has been rolled back. */
> +		return -1;
> +	}
> +	return 0;
> +}
> +
>   void
>   txn_limbo_init(void)
>   {
> diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
> index de415cd97..94f224131 100644
> --- a/src/box/txn_limbo.h
> +++ b/src/box/txn_limbo.h
> @@ -166,6 +166,35 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
>   void
>   txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn);
>   
> +/**
> + * Return TRUE if limbo is empty.
> + */
> +static inline bool
> +txn_limbo_is_empty(struct txn_limbo *limbo)
> +{
> +	return rlist_empty(&limbo->queue);
> +}
> +
> +/**
> + * Return a pointer to the last txn_limbo_entry of limbo.
> + */
> +static inline struct txn_limbo_entry *
> +txn_limbo_last_entry(struct txn_limbo *limbo)
> +{
> +	return rlist_last_entry(&limbo->queue, struct txn_limbo_entry,
> +				in_queue);
> +}
> +
> +double
> +txn_limbo_confirm_timeout(struct txn_limbo *limbo);
> +
> +/**
> + * Waiting for confirmation of all "sync" transactions
> + * during confirm timeout or fail.
> + */
> +int
> +txn_limbo_wait_confirm(struct txn_limbo *limbo);
> +
>   void
>   txn_limbo_init();
>   
> diff --git a/test/unit/CMakeLists.txt b/test/unit/CMakeLists.txt
> index 672122118..419477748 100644
> --- a/test/unit/CMakeLists.txt
> +++ b/test/unit/CMakeLists.txt
> @@ -257,6 +257,9 @@ target_link_libraries(swim_errinj.test unit swim)
>   add_executable(merger.test merger.test.c)
>   target_link_libraries(merger.test unit core box)
>   
> +add_executable(snap_quorum_delay.test snap_quorum_delay.cc)
> +target_link_libraries(snap_quorum_delay.test box core unit)
> +
>   #
>   # Client for popen.test
>   add_executable(popen-child popen-child.c)
> diff --git a/test/unit/snap_quorum_delay.cc b/test/unit/snap_quorum_delay.cc
> new file mode 100644
> index 000000000..7a200673a
> --- /dev/null
> +++ b/test/unit/snap_quorum_delay.cc
> @@ -0,0 +1,250 @@
> +/*
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and iproto forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + *    copyright notice, this list of conditions and the
> + *    following disclaimer.
> + *
> + * 2. Redistributions in iproto form must reproduce the above
> + *    copyright notice, this list of conditions and the following
> + *    disclaimer in the documentation and/or other materials
> + *    provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include "unit.h"
> +#include "gc.h"
> +#include "memory.h"
> +#include "txn.h"
> +#include "txn_limbo.h"
> +
> +/**
> + * This test is only about delay in snapshot machinery (needed
> + * for qsync replication). It doesn't test the snapshot
> + * machinery, txn_limbo or something else and uses some tricks
> + * around txn_limbo.
> + * The logic of the test is as folows:
> + * In fiber_1 ("txn_fiber"):
> + *	- start a transaction.
> + *	- push the transaction to the limbo.
> + *	- start wait confirm (yield).
> + * In fiber_2 ("main"):
> + *	- do a snapshot.
> + *	- start wait while the last transaction
> + *	  from the limbo will be completed.
> + * In fiber_3 ("confirm_fiber"):
> + *	- confirm the transaction (remove the transaction from
> + *				   the limbo and wakeup fiber_1).
> + * In fiber_1 ("txn_fiber"):
> + *	- confirm/rollback/hung the transaction.
> + * In fiber_2 ("main"):
> + *	- check_results
> + */
> +
> +extern int replication_synchro_quorum;
> +extern double replication_synchro_timeout;
> +
> +namespace /* local symbols */ {
> +
> +int test_result;
> +
> +/**
> + * Variations of a transaction completion.
> + */
> +enum process_type {
> +	TXN_PROCESS_COMMIT,
> +	TXN_PROCESS_ROLLBACK,
> +	TXN_PROCESS_TIMEOUT
> +};
> +
> +/**
> + * Some fake values needed for work with the limbo
> + * (to push a transaction to the limbo and simulate confirm).
> + */
> +const int fake_lsn = 1;
> +const int instace_id = 1;
> +const int relay_id = 2;
> +
> +int
> +trg_cb(struct trigger *trigger, void *event)
> +{
> +	(void)event;
> +	bool *check_trg = (bool *)trigger->data;
> +	*check_trg = true;
> +	return 0;
> +}
> +
> +int
> +txn_process_func(va_list ap)
> +{
> +	bool *check_trg = va_arg(ap, bool *);
> +	enum process_type process_type = (enum process_type)va_arg(ap, int);
> +	struct txn *txn = txn_begin();
> +	txn->fiber = fiber();
> +	/* Set the TXN_WAIT_ACK flag to simulate a sync transaction.*/
> +	txn_set_flag(txn, TXN_WAIT_ACK);
> +	/*
> +	 * The true way to push the transaction to limbo is to call
> +	 * txn_commit() for sync transaction. But, if txn_commit()
> +	 * will be called now, the transaction will not be pushed to
> +	 * the limbo because this is the case txn_commit_nop().
> +	 * Instead, we push the transaction to the limbo manually
> +	 * and call txn_commit (or another) later.
> +	 */
> +	struct txn_limbo_entry *entry = txn_limbo_append(&txn_limbo,
> +							 instace_id, txn);
> +	/*
> +	 * The trigger is used to verify that the transaction has been
> +	 * completed.
> +	 */
> +	struct trigger trg;
> +	trigger_create(&trg, trg_cb, check_trg, NULL);
> +
> +	switch (process_type) {
> +	case TXN_PROCESS_COMMIT:
> +		txn_on_commit(txn, &trg);
> +		break;
> +	case TXN_PROCESS_ROLLBACK:
> +		txn_on_rollback(txn, &trg);
> +		break;
> +	case TXN_PROCESS_TIMEOUT:
> +		break;
> +	default:
> +		unreachable();
> +	}
> +
> +	txn_limbo_assign_lsn(&txn_limbo, entry, fake_lsn);
> +	txn_limbo_wait_complete(&txn_limbo, entry);
> +
> +	switch (process_type) {
> +	case TXN_PROCESS_COMMIT:
> +		txn_commit(txn);
> +		break;
> +	case TXN_PROCESS_ROLLBACK:
> +		txn_rollback(txn);
> +		break;
> +	case TXN_PROCESS_TIMEOUT:
> +		fiber_yield();
> +		break;
> +	default:
> +		unreachable();
> +	}
> +	return 0;
> +}
> +
> +int
> +txn_confirm_func(va_list ap)
> +{
> +	/*
> +	 * We shouldn't react on gc_wait_cleanup() yield
> +	 * inside gc_checkpoint().
> +	 */
> +	fiber_sleep(0);
> +	txn_limbo_ack(&txn_limbo, relay_id, fake_lsn);
> +	return 0;
> +}
> +
> +
> +void
> +test_snap_delay_common(enum process_type process_type)
> +{
> +	plan(1);
> +
> +	/*
> +	 * We need to clear the limbo vclock before the new test
> +	 * variation because the same fake lsn will be used.
> +	 */
> +	vclock_clear(&txn_limbo.vclock);
> +	vclock_create(&txn_limbo.vclock);
> +
> +	bool check_trg = false;
> +	struct fiber *txn_fiber = fiber_new("txn_fiber", txn_process_func);
> +	fiber_start(txn_fiber, &check_trg, process_type);
> +
> +	struct fiber *confirm_entry = fiber_new("confirm_fiber",
> +						txn_confirm_func);
> +	fiber_wakeup(confirm_entry);
> +
> +	switch (process_type) {
> +	case TXN_PROCESS_COMMIT:
> +		ok(gc_checkpoint() == 0 && check_trg,
> +		   "check snapshot delay confirm");
> +		break;
> +	case TXN_PROCESS_ROLLBACK:
> +		ok(gc_checkpoint() == -1 && check_trg,
> +		   "check snapshot delay rollback");
> +		break;
> +	case TXN_PROCESS_TIMEOUT:
> +		ok(gc_checkpoint() == -1, "check snapshot delay timeout");
> +		/* join the "hung" fiber */
> +		fiber_set_joinable(txn_fiber, true);
> +		fiber_cancel(txn_fiber);
> +		fiber_join(txn_fiber);
> +		break;
> +	default:
> +		unreachable();
> +	}
> +	check_plan();
> +}
> +
> +void
> +test_snap_delay_timeout()
> +{
> +	/* Set the timeout to a small value for the test. */
> +	replication_synchro_timeout = 0.01;
> +	test_snap_delay_common(TXN_PROCESS_TIMEOUT);
> +}
> +
> +int
> +test_snap_delay(va_list ap)
> +{
> +	header();
> +	plan(3);
> +	(void)ap;
> +	replication_synchro_quorum = 2;
> +
> +	test_snap_delay_common(TXN_PROCESS_COMMIT);
> +	test_snap_delay_common(TXN_PROCESS_ROLLBACK);
> +	test_snap_delay_timeout();
> +
> +	ev_break(loop(), EVBREAK_ALL);
> +	footer();
> +	test_result = check_plan();
> +	return 0;
> +}
> +} /* end of anonymous namespace */
> +
> +int
> +main(void)
> +{
> +	memory_init();
> +	fiber_init(fiber_c_invoke);
> +	gc_init();
> +	txn_limbo_init();
> +
> +	struct fiber *main_fiber = fiber_new("main", test_snap_delay);
> +	assert(main_fiber != NULL);
> +	fiber_wakeup(main_fiber);
> +	ev_run(loop(), 0);
> +
> +	gc_free();
> +	fiber_free();
> +	memory_free();
> +	return test_result;
> +}
> diff --git a/test/unit/snap_quorum_delay.result b/test/unit/snap_quorum_delay.result
> new file mode 100644
> index 000000000..6ca213391
> --- /dev/null
> +++ b/test/unit/snap_quorum_delay.result
> @@ -0,0 +1,12 @@
> +	*** test_snap_delay ***
> +1..3
> +    1..1
> +    ok 1 - check snapshot delay confirm
> +ok 1 - subtests
> +    1..1
> +    ok 1 - check snapshot delay rollback
> +ok 2 - subtests
> +    1..1
> +    ok 1 - check snapshot delay timeout
> +ok 3 - subtests
> +	*** test_snap_delay: done ***
Thanks for the patch! LGTM.

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 14/19] applier: remove writer_cond
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 14/19] applier: remove writer_cond Vladislav Shpilevoy
@ 2020-07-02  9:13     ` Serge Petrenko
  0 siblings, 0 replies; 68+ messages in thread
From: Serge Petrenko @ 2020-07-02  9:13 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches


30.06.2020 02:15, Vladislav Shpilevoy пишет:
> Writer condition variable was used by the writer fiber to be woken
> up when it is time to send a heartbeat or an ACK.
>
> However it is not really needed, because writer fiber pointer is
> always available in the same structure as writer_cond, and can be
> used to call fiber_wakeup() directly.
>
> Note, fiber_cond_signal() is basically the same fiber_wakeup().
>
> The patch is not just refactoring for nothing. It is a
> prerequisite for #5100. In this issue it will be needed to wakeup
> the applier's writer fiber directly on each WAL write from txn.c
> module. So the writer_cond won't be available. The only usable
> thing will be txn->fiber, which will be set to applier's writer.
>
> Part of #5100
> ---
>   src/box/applier.cc | 11 ++++-------
>   src/box/applier.h  |  2 --
>   2 files changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index fbb452dc0..635a9849c 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -155,11 +155,9 @@ applier_writer_f(va_list ap)
>   		 * replication_timeout seconds any more.
>   		 */
>   		if (applier->version_id >= version_id(1, 7, 7))
> -			fiber_cond_wait_timeout(&applier->writer_cond,
> -						TIMEOUT_INFINITY);
> +			fiber_yield_timeout(TIMEOUT_INFINITY);
>   		else
> -			fiber_cond_wait_timeout(&applier->writer_cond,
> -						replication_timeout);
> +			fiber_yield_timeout(replication_timeout);
>   		/*
>   		 * A writer fiber is going to be awaken after a commit or
>   		 * a heartbeat message. So this is an appropriate place to
> @@ -928,7 +926,7 @@ applier_on_commit(struct trigger *trigger, void *event)
>   {
>   	(void) event;
>   	struct applier *applier = (struct applier *)trigger->data;
> -	fiber_cond_signal(&applier->writer_cond);
> +	fiber_wakeup(applier->writer);
>   	return 0;
>   }
>   
> @@ -1093,7 +1091,7 @@ applier_subscribe(struct applier *applier)
>   		 */
>   		if (stailq_first_entry(&rows, struct applier_tx_row,
>   				       next)->row.lsn == 0)
> -			fiber_cond_signal(&applier->writer_cond);
> +			fiber_wakeup(applier->writer);
>   		else if (applier_apply_tx(&rows) != 0)
>   			diag_raise();
>   
> @@ -1289,7 +1287,6 @@ applier_new(const char *uri)
>   	applier->last_row_time = ev_monotonic_now(loop());
>   	rlist_create(&applier->on_state);
>   	fiber_cond_create(&applier->resume_cond);
> -	fiber_cond_create(&applier->writer_cond);
>   	diag_create(&applier->diag);
>   
>   	return applier;
> diff --git a/src/box/applier.h b/src/box/applier.h
> index c9fdc2955..4f8bee84e 100644
> --- a/src/box/applier.h
> +++ b/src/box/applier.h
> @@ -78,8 +78,6 @@ struct applier {
>   	struct fiber *reader;
>   	/** Background fiber to reply with vclock */
>   	struct fiber *writer;
> -	/** Writer cond. */
> -	struct fiber_cond writer_cond;
>   	/** Finite-state machine */
>   	enum applier_state state;
>   	/** Local time of this replica when the last row has been received */
Thanks for the patch! LGTM.

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH 1/4] replication: regression test on gh-5119 [not fixed]
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (20 preceding siblings ...)
  2020-06-30 23:00   ` [Tarantool-patches] [PATCH v2 20/19] replication: add test for quorum 1 Vladislav Shpilevoy
@ 2020-07-02 21:13   ` sergeyb
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication sergeyb
                     ` (5 subsequent siblings)
  27 siblings, 0 replies; 68+ messages in thread
From: sergeyb @ 2020-07-02 21:13 UTC (permalink / raw)
  To: tarantool-patches, v.shpilevoy, sergepetrenko, gorcunov, lvasiliev

From: Sergey Bronnikov <sergeyb@tarantool.org>

---
 .../gh-5119-sync_replication.result           | 70 +++++++++++++++++++
 .../gh-5119-sync_replication.skipcond         |  2 +
 .../gh-5119-sync_replication.test.lua         | 37 ++++++++++
 3 files changed, 109 insertions(+)
 create mode 100644 test/replication/gh-5119-sync_replication.result
 create mode 100755 test/replication/gh-5119-sync_replication.skipcond
 create mode 100644 test/replication/gh-5119-sync_replication.test.lua

diff --git a/test/replication/gh-5119-sync_replication.result b/test/replication/gh-5119-sync_replication.result
new file mode 100644
index 000000000..83851ebca
--- /dev/null
+++ b/test/replication/gh-5119-sync_replication.result
@@ -0,0 +1,70 @@
+-- test-run result file version 2
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=30}
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+_ = fiber.create(function()
+    box.space.sync:insert{1}
+end);
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+box.space.sync:insert{1}
+ | ---
+ | - error: Duplicate key exists in unique index 'pk' in space 'sync'
+ | ...
+box.space.sync:insert{2}
+ | 
+box.space.sync:insert{3}
+ | [Lost current connection]
+test_run:switch('replica')
diff --git a/test/replication/gh-5119-sync_replication.skipcond b/test/replication/gh-5119-sync_replication.skipcond
new file mode 100755
index 000000000..913ac004a
--- /dev/null
+++ b/test/replication/gh-5119-sync_replication.skipcond
@@ -0,0 +1,2 @@
+# gh-5119 is opened
+self.skip = 1
diff --git a/test/replication/gh-5119-sync_replication.test.lua b/test/replication/gh-5119-sync_replication.test.lua
new file mode 100644
index 000000000..0331003b9
--- /dev/null
+++ b/test/replication/gh-5119-sync_replication.test.lua
@@ -0,0 +1,37 @@
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
+fiber = require('fiber')
+
+box.schema.user.grant('guest', 'replication')
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=30}
+test_run:cmd("setopt delimiter ';'")
+_ = fiber.create(function()
+    box.space.sync:insert{1}
+end);
+test_run:cmd("setopt delimiter ''");
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+box.space.sync:insert{1}
+box.space.sync:insert{2}
+box.space.sync:insert{3}
+test_run:switch('replica')
+box.space.sync:select{} -- 1, 2, 3
+test_run:switch('default')
+box.space.sync:truncate()
+
+-- Cleanup.
+test_run:cmd('switch default')
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.space.sync:drop()
+box.schema.user.revoke('guest', 'replication')
-- 
2.26.2

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (21 preceding siblings ...)
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 1/4] replication: regression test on gh-5119 [not fixed] sergeyb
@ 2020-07-02 21:13   ` sergeyb
  2020-07-02 22:46     ` Sergey Bronnikov
                       ` (2 more replies)
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 3/4] replication: add tests for sync replication with anon replica sergeyb
                     ` (4 subsequent siblings)
  27 siblings, 3 replies; 68+ messages in thread
From: sergeyb @ 2020-07-02 21:13 UTC (permalink / raw)
  To: tarantool-patches, v.shpilevoy, sergepetrenko, gorcunov, lvasiliev

From: Sergey Bronnikov <sergeyb@tarantool.org>

Part of #5055
---
 test/replication/qsync_advanced.result   | 939 +++++++++++++++++++++++
 test/replication/qsync_advanced.test.lua | 337 ++++++++
 2 files changed, 1276 insertions(+)
 create mode 100644 test/replication/qsync_advanced.result
 create mode 100644 test/replication/qsync_advanced.test.lua

diff --git a/test/replication/qsync_advanced.result b/test/replication/qsync_advanced.result
new file mode 100644
index 000000000..fa94c8339
--- /dev/null
+++ b/test/replication/qsync_advanced.result
@@ -0,0 +1,939 @@
+-- test-run result file version 2
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+
+NUM_INSTANCES = 2
+ | ---
+ | ...
+BROKEN_QUORUM = NUM_INSTANCES + 1
+ | ---
+ | ...
+
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+disable_sync_mode = function()
+    local s = box.space._space:get(box.space.sync.id)
+    local new_s = s:update({{'=', 6, {is_sync=false}}})
+    box.space._space:replace(new_s)
+end;
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+
+-- Setup an async cluster with two instances.
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+-- Successful write.
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1} -- success
+ | ---
+ | - [1]
+ | ...
+test_run:cmd('switch replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- Unsuccessfull write.
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- none
+ | ---
+ | - []
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- Updated replication_synchro_quorum doesn't affect existed tx.
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+OP_TIMEOUT = 5
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=OP_TIMEOUT}
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+_ = fiber.create(function()
+    box.space.sync:insert{1}
+end);
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES}
+ | ---
+ | ...
+fiber.sleep(OP_TIMEOUT) -- to make sure replication_synchro_timeout is exceeded
+ | ---
+ | ...
+box.space.sync:select{} -- none
+ | ---
+ | - []
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- none
+ | ---
+ | - []
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- [RFC, quorum commit] attempt to write multiple transactions, expected the
+-- same order as on client in case of achieved quorum.
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.space.sync:insert{2}
+ | ---
+ | - [2]
+ | ...
+box.space.sync:insert{3}
+ | ---
+ | - [3]
+ | ...
+box.space.sync:select{} -- 1, 2, 3
+ | ---
+ | - - [1]
+ |   - [2]
+ |   - [3]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1, 2, 3
+ | ---
+ | - - [1]
+ |   - [2]
+ |   - [3]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- Synchro timeout is not bigger than replication_synchro_timeout value.
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=orig_synchro_timeout}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+start = os.time()
+ | ---
+ | ...
+box.space.sync:insert{1}
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+(os.time() - start) == box.cfg.replication_synchro_timeout -- true
+ | ---
+ | - true
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- replication_synchro_quorum
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+INT_MIN = -2147483648
+ | ---
+ | ...
+INT_MAX = 2147483648
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=INT_MAX} -- error
+ | ---
+ | - error: 'Incorrect value for option ''replication_synchro_quorum'': the value must
+ |     be greater than zero and less than maximal number of replicas'
+ | ...
+box.cfg.replication_synchro_quorum -- old value
+ | ---
+ | - 3
+ | ...
+box.cfg{replication_synchro_quorum=INT_MIN} -- error
+ | ---
+ | - error: 'Incorrect value for option ''replication_synchro_quorum'': the value must
+ |     be greater than zero and less than maximal number of replicas'
+ | ...
+box.cfg.replication_synchro_quorum -- old value
+ | ---
+ | - 3
+ | ...
+
+-- replication_synchro_timeout
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+DOUBLE_MAX = 9007199254740992
+ | ---
+ | ...
+box.cfg{replication_synchro_timeout=DOUBLE_MAX}
+ | ---
+ | ...
+box.cfg.replication_synchro_timeout -- DOUBLE_MAX
+ | ---
+ | - 9007199254740992
+ | ...
+box.cfg{replication_synchro_timeout=DOUBLE_MAX+1}
+ | ---
+ | ...
+box.cfg.replication_synchro_timeout -- DOUBLE_MAX
+ | ---
+ | - 9007199254740992
+ | ...
+box.cfg{replication_synchro_timeout=-1} -- error
+ | ---
+ | - error: 'Incorrect value for option ''replication_synchro_timeout'': the value must
+ |     be greater than zero'
+ | ...
+box.cfg.replication_synchro_timeout -- old value
+ | ---
+ | - 9007199254740992
+ | ...
+box.cfg{replication_synchro_timeout=0} -- error
+ | ---
+ | - error: 'Incorrect value for option ''replication_synchro_timeout'': the value must
+ |     be greater than zero'
+ | ...
+box.cfg.replication_synchro_timeout -- old value
+ | ---
+ | - 9007199254740992
+ | ...
+
+-- TX is in synchronous replication.
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.begin() box.space.sync:insert({1}) box.commit()
+ | ---
+ | ...
+box.begin() box.space.sync:insert({2}) box.commit()
+ | ---
+ | ...
+-- Testcase cleanup.
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- [RFC, summary] switch sync replicas into async ones, expected success and
+-- data consistency on a leader and replicas.
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+-- Disable synchronous mode.
+disable_sync_mode()
+ | ---
+ | ...
+-- Space is in async mode now.
+box.cfg{replication_synchro_quorum=NUM_INSTANCES}
+ | ---
+ | ...
+box.space.sync:insert{2} -- success
+ | ---
+ | - [2]
+ | ...
+box.space.sync:insert{3} -- success
+ | ---
+ | - [3]
+ | ...
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM}
+ | ---
+ | ...
+box.space.sync:insert{4} -- success
+ | ---
+ | - [4]
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES}
+ | ---
+ | ...
+box.space.sync:insert{5} -- success
+ | ---
+ | - [5]
+ | ...
+box.space.sync:select{} -- 1, 2, 3, 4, 5
+ | ---
+ | - - [1]
+ |   - [2]
+ |   - [3]
+ |   - [4]
+ |   - [5]
+ | ...
+test_run:cmd('switch replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1, 2, 3, 4, 5
+ | ---
+ | - - [1]
+ |   - [2]
+ |   - [3]
+ |   - [4]
+ |   - [5]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- (FLAKY) [RFC, Synchronous replication enabling] "As soon as last operation of
+-- synchronous transaction appeared in leader's WAL, it will cause all
+-- following transactions - no matter if they are synchronous or not - wait for
+-- the quorum. In case quorum is not achieved the 'rollback' operation will
+-- cause rollback of all transactions after the synchronous one."
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+-- OP_TIMEOUT should be enough to make sync operation, disable
+-- sync mode and make an async operation
+OP_TIMEOUT = 10
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=OP_TIMEOUT}
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+_ = fiber.create(function()
+    box.space.sync:insert{2}
+end);
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+-- Disable synchronous mode.
+disable_sync_mode()
+ | ---
+ | - error: A rollback for a synchronous transaction is received
+ | ...
+-- Space is in async mode now.
+box.space.sync:insert{3} -- async operation must wait sync one
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+fiber.sleep(OP_TIMEOUT + 1)
+ | ---
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:cmd('switch replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- Warn user when setting `replication_synchro_quorum` to a value
+-- greater than number of instances in a cluster, see gh-5122.
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning
+ | ---
+ | ...
+
+-- [RFC, summary] switch from leader to replica and vice versa, expected
+-- success and data consistency on a leader and replicas (gh-5124).
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+box.cfg{read_only=false} -- promote replica to master
+ | ---
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{read_only=true} -- demote master to replica
+ | ---
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{2}
+ | ---
+ | - [2]
+ | ...
+box.space.sync:select{} -- 1, 2
+ | ---
+ | - - [1]
+ |   - [2]
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1, 2
+ | ---
+ | - - [1]
+ | ...
+-- Revert cluster configuration.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{read_only=false}
+ | ---
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.cfg{read_only=true}
+ | ---
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- check behaviour with failed write to WAL on master (ERRINJ_WAL_IO)
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+box.error.injection.set('ERRINJ_WAL_IO', true)
+ | ---
+ | - ok
+ | ...
+box.space.sync:insert{2}
+ | ---
+ | - error: Failed to write to disk
+ | ...
+box.error.injection.set('ERRINJ_WAL_IO', false)
+ | ---
+ | - ok
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- [RFC, quorum commit] check behaviour with failure answer from a replica
+-- (ERRINJ_WAL_SYNC) during write, expected disconnect from the replication
+-- (gh-5123, set replication_synchro_quorum to 1).
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.error.injection.set('ERRINJ_WAL_IO', true)
+ | ---
+ | - ok
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{2}
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.error.injection.set('ERRINJ_WAL_IO', false)
+ | ---
+ | - ok
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- Teardown.
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+test_run:cleanup_cluster()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
+box.cfg{                                                                        \
+    replication_synchro_quorum = orig_synchro_quorum,                           \
+    replication_synchro_timeout = orig_synchro_timeout,                         \
+}
+ | ---
+ | ...
+
+-- Setup an async cluster.
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+-- [RFC, summary] switch async replica into sync one, expected
+-- success and data consistency on a leader and replica.
+-- Testcase setup.
+_ = box.schema.space.create('sync', {engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+box.space.sync:insert{1} -- success
+ | ---
+ | - [1]
+ | ...
+test_run:cmd('switch replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+-- Enable synchronous mode.
+s = box.space._space:get(box.space.sync.id)
+ | ---
+ | ...
+new_s = s:update({{'=', 6, {is_sync=true}}})
+ | ---
+ | ...
+box.space._space:replace(new_s)
+ | ---
+ | - [523, 1, 'sync', 'vinyl', 0, {'is_sync': true}, []]
+ | ...
+-- Space is in sync mode now.
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+box.space.sync:insert{2} -- success
+ | ---
+ | - [2]
+ | ...
+box.space.sync:insert{3} -- success
+ | ---
+ | - [3]
+ | ...
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+box.space.sync:insert{4} -- failure
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+box.space.sync:insert{5} -- success
+ | ---
+ | - [5]
+ | ...
+box.space.sync:select{} -- 1, 2, 3, 5
+ | ---
+ | - - [1]
+ |   - [2]
+ |   - [3]
+ |   - [5]
+ | ...
+test_run:cmd('switch replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1, 2, 3, 5
+ | ---
+ | - - [1]
+ |   - [2]
+ |   - [3]
+ |   - [5]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- Teardown.
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+test_run:cleanup_cluster()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
+box.cfg{                                                                        \
+    replication_synchro_quorum = orig_synchro_quorum,                           \
+    replication_synchro_timeout = orig_synchro_timeout,                         \
+}
+ | ---
+ | ...
diff --git a/test/replication/qsync_advanced.test.lua b/test/replication/qsync_advanced.test.lua
new file mode 100644
index 000000000..270fd494d
--- /dev/null
+++ b/test/replication/qsync_advanced.test.lua
@@ -0,0 +1,337 @@
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
+fiber = require('fiber')
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+
+NUM_INSTANCES = 2
+BROKEN_QUORUM = NUM_INSTANCES + 1
+
+test_run:cmd("setopt delimiter ';'")
+disable_sync_mode = function()
+    local s = box.space._space:get(box.space.sync.id)
+    local new_s = s:update({{'=', 6, {is_sync=false}}})
+    box.space._space:replace(new_s)
+end;
+test_run:cmd("setopt delimiter ''");
+
+box.schema.user.grant('guest', 'replication')
+
+-- Setup an async cluster with two instances.
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+-- Successful write.
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1} -- success
+test_run:cmd('switch replica')
+box.space.sync:select{} -- 1
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Unsuccessfull write.
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+test_run:switch('replica')
+box.space.sync:select{} -- none
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Updated replication_synchro_quorum doesn't affect existed tx.
+-- Testcase setup.
+test_run:switch('default')
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+OP_TIMEOUT = 5
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=OP_TIMEOUT}
+test_run:cmd("setopt delimiter ';'")
+_ = fiber.create(function()
+    box.space.sync:insert{1}
+end);
+test_run:cmd("setopt delimiter ''");
+box.cfg{replication_synchro_quorum=NUM_INSTANCES}
+fiber.sleep(OP_TIMEOUT) -- to make sure replication_synchro_timeout is exceeded
+box.space.sync:select{} -- none
+test_run:switch('replica')
+box.space.sync:select{} -- none
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- [RFC, quorum commit] attempt to write multiple transactions, expected the
+-- same order as on client in case of achieved quorum.
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+box.space.sync:insert{2}
+box.space.sync:insert{3}
+box.space.sync:select{} -- 1, 2, 3
+test_run:switch('replica')
+box.space.sync:select{} -- 1, 2, 3
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Synchro timeout is not bigger than replication_synchro_timeout value.
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=orig_synchro_timeout}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+start = os.time()
+box.space.sync:insert{1}
+(os.time() - start) == box.cfg.replication_synchro_timeout -- true
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- replication_synchro_quorum
+test_run:switch('default')
+INT_MIN = -2147483648
+INT_MAX = 2147483648
+box.cfg{replication_synchro_quorum=INT_MAX} -- error
+box.cfg.replication_synchro_quorum -- old value
+box.cfg{replication_synchro_quorum=INT_MIN} -- error
+box.cfg.replication_synchro_quorum -- old value
+
+-- replication_synchro_timeout
+test_run:switch('default')
+DOUBLE_MAX = 9007199254740992
+box.cfg{replication_synchro_timeout=DOUBLE_MAX}
+box.cfg.replication_synchro_timeout -- DOUBLE_MAX
+box.cfg{replication_synchro_timeout=DOUBLE_MAX+1}
+box.cfg.replication_synchro_timeout -- DOUBLE_MAX
+box.cfg{replication_synchro_timeout=-1} -- error
+box.cfg.replication_synchro_timeout -- old value
+box.cfg{replication_synchro_timeout=0} -- error
+box.cfg.replication_synchro_timeout -- old value
+
+-- TX is in synchronous replication.
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.begin() box.space.sync:insert({1}) box.commit()
+box.begin() box.space.sync:insert({2}) box.commit()
+-- Testcase cleanup.
+box.space.sync:drop()
+
+-- [RFC, summary] switch sync replicas into async ones, expected success and
+-- data consistency on a leader and replicas.
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+box.space.sync:select{} -- 1
+test_run:switch('replica')
+box.space.sync:select{} -- 1
+test_run:switch('default')
+-- Disable synchronous mode.
+disable_sync_mode()
+-- Space is in async mode now.
+box.cfg{replication_synchro_quorum=NUM_INSTANCES}
+box.space.sync:insert{2} -- success
+box.space.sync:insert{3} -- success
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM}
+box.space.sync:insert{4} -- success
+box.cfg{replication_synchro_quorum=NUM_INSTANCES}
+box.space.sync:insert{5} -- success
+box.space.sync:select{} -- 1, 2, 3, 4, 5
+test_run:cmd('switch replica')
+box.space.sync:select{} -- 1, 2, 3, 4, 5
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- (FLAKY) [RFC, Synchronous replication enabling] "As soon as last operation of
+-- synchronous transaction appeared in leader's WAL, it will cause all
+-- following transactions - no matter if they are synchronous or not - wait for
+-- the quorum. In case quorum is not achieved the 'rollback' operation will
+-- cause rollback of all transactions after the synchronous one."
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+box.space.sync:select{} -- 1
+test_run:switch('replica')
+box.space.sync:select{} -- 1
+test_run:switch('default')
+-- OP_TIMEOUT should be enough to make sync operation, disable
+-- sync mode and make an async operation
+OP_TIMEOUT = 10
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=OP_TIMEOUT}
+test_run:cmd("setopt delimiter ';'")
+_ = fiber.create(function()
+    box.space.sync:insert{2}
+end);
+test_run:cmd("setopt delimiter ''");
+-- Disable synchronous mode.
+disable_sync_mode()
+-- Space is in async mode now.
+box.space.sync:insert{3} -- async operation must wait sync one
+fiber.sleep(OP_TIMEOUT + 1)
+box.space.sync:select{} -- 1
+test_run:cmd('switch replica')
+box.space.sync:select{} -- 1
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Warn user when setting `replication_synchro_quorum` to a value
+-- greater than number of instances in a cluster, see gh-5122.
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning
+
+-- [RFC, summary] switch from leader to replica and vice versa, expected
+-- success and data consistency on a leader and replicas (gh-5124).
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+box.space.sync:select{} -- 1
+test_run:switch('replica')
+box.space.sync:select{} -- 1
+box.cfg{read_only=false} -- promote replica to master
+test_run:switch('default')
+box.cfg{read_only=true} -- demote master to replica
+test_run:switch('replica')
+box.space.sync:insert{2}
+box.space.sync:select{} -- 1, 2
+test_run:switch('default')
+box.space.sync:select{} -- 1, 2
+-- Revert cluster configuration.
+test_run:switch('default')
+box.cfg{read_only=false}
+test_run:switch('replica')
+box.cfg{read_only=true}
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Check behaviour with failed write to WAL on master (ERRINJ_WAL_IO).
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+box.space.sync:select{} -- 1
+box.error.injection.set('ERRINJ_WAL_IO', true)
+box.space.sync:insert{2}
+box.error.injection.set('ERRINJ_WAL_IO', false)
+box.space.sync:select{} -- 1
+test_run:switch('replica')
+box.space.sync:select{} -- 1
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- [RFC, quorum commit] check behaviour with failure answer from a replica
+-- (ERRINJ_WAL_SYNC) during write, expected disconnect from the replication
+-- (gh-5123, set replication_synchro_quorum to 1).
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+box.space.sync:select{} -- 1
+test_run:switch('replica')
+box.error.injection.set('ERRINJ_WAL_IO', true)
+test_run:switch('default')
+box.space.sync:insert{2}
+test_run:switch('replica')
+box.error.injection.set('ERRINJ_WAL_IO', false)
+box.space.sync:select{} -- 1
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Teardown.
+test_run:cmd('switch default')
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+test_run:cleanup_cluster()
+box.schema.user.revoke('guest', 'replication')
+box.cfg{                                                                        \
+    replication_synchro_quorum = orig_synchro_quorum,                           \
+    replication_synchro_timeout = orig_synchro_timeout,                         \
+}
+
+-- Setup an async cluster.
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+-- [RFC, summary] switch async replica into sync one, expected
+-- success and data consistency on a leader and replica.
+-- Testcase setup.
+_ = box.schema.space.create('sync', {engine=engine})
+_ = box.space.sync:create_index('pk')
+box.space.sync:insert{1} -- success
+test_run:cmd('switch replica')
+box.space.sync:select{} -- 1
+test_run:switch('default')
+-- Enable synchronous mode.
+s = box.space._space:get(box.space.sync.id)
+new_s = s:update({{'=', 6, {is_sync=true}}})
+box.space._space:replace(new_s)
+-- Space is in sync mode now.
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.space.sync:insert{2} -- success
+box.space.sync:insert{3} -- success
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+box.space.sync:insert{4} -- failure
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.space.sync:insert{5} -- success
+box.space.sync:select{} -- 1, 2, 3, 5
+test_run:cmd('switch replica')
+box.space.sync:select{} -- 1, 2, 3, 5
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Teardown.
+test_run:cmd('switch default')
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+test_run:cleanup_cluster()
+box.schema.user.revoke('guest', 'replication')
+box.cfg{                                                                        \
+    replication_synchro_quorum = orig_synchro_quorum,                           \
+    replication_synchro_timeout = orig_synchro_timeout,                         \
+}
-- 
2.26.2

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH 3/4] replication: add tests for sync replication with anon replica
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (22 preceding siblings ...)
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication sergeyb
@ 2020-07-02 21:13   ` sergeyb
  2020-07-06 23:31     ` Vladislav Shpilevoy
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 4/4] replication: add tests for sync replication with snapshots sergeyb
                     ` (3 subsequent siblings)
  27 siblings, 1 reply; 68+ messages in thread
From: sergeyb @ 2020-07-02 21:13 UTC (permalink / raw)
  To: tarantool-patches, v.shpilevoy, sergepetrenko, gorcunov, lvasiliev

From: Sergey Bronnikov <sergeyb@tarantool.org>

Part of #5055
---
 test/replication/qsync_with_anon.result   | 177 ++++++++++++++++++++++
 test/replication/qsync_with_anon.test.lua |  65 ++++++++
 2 files changed, 242 insertions(+)
 create mode 100644 test/replication/qsync_with_anon.result
 create mode 100644 test/replication/qsync_with_anon.test.lua

diff --git a/test/replication/qsync_with_anon.result b/test/replication/qsync_with_anon.result
new file mode 100644
index 000000000..59506a608
--- /dev/null
+++ b/test/replication/qsync_with_anon.result
@@ -0,0 +1,177 @@
+-- test-run result file version 2
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+
+NUM_INSTANCES = 2
+ | ---
+ | ...
+BROKEN_QUORUM = NUM_INSTANCES + 1
+ | ---
+ | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+
+-- Setup a cluster with anonymous replica.
+test_run:cmd('create server replica_anon with rpl_master=default, script="replication/anon1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica_anon')
+ | ---
+ | - true
+ | ...
+test_run:cmd('switch replica_anon')
+ | ---
+ | - true
+ | ...
+
+-- [RFC, Asynchronous replication] successful transaction applied on async
+-- replica.
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{1} -- success
+ | ---
+ | - [1]
+ | ...
+box.space.sync:insert{2} -- success
+ | ---
+ | - [2]
+ | ...
+box.space.sync:insert{3} -- success
+ | ---
+ | - [3]
+ | ...
+test_run:cmd('switch replica_anon')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1, 2, 3
+ | ---
+ | - - [1]
+ |   - [2]
+ |   - [3]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- [RFC, Asynchronous replication] failed transaction rolled back on async
+-- replica.
+-- Testcase setup.
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:insert{1} -- failure
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+test_run:cmd('switch replica_anon')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- none
+ | ---
+ | - []
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES}
+ | ---
+ | ...
+box.space.sync:insert{1} -- success
+ | ---
+ | - [1]
+ | ...
+test_run:cmd('switch replica_anon')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- Teardown.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica_anon')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica_anon')
+ | ---
+ | - true
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
+box.cfg{                                                                        \
+    replication_synchro_quorum = orig_synchro_quorum,                           \
+    replication_synchro_timeout = orig_synchro_timeout,                         \
+}
+ | ---
+ | ...
+test_run:cleanup_cluster()
+ | ---
+ | ...
diff --git a/test/replication/qsync_with_anon.test.lua b/test/replication/qsync_with_anon.test.lua
new file mode 100644
index 000000000..aed62775e
--- /dev/null
+++ b/test/replication/qsync_with_anon.test.lua
@@ -0,0 +1,65 @@
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+
+NUM_INSTANCES = 2
+BROKEN_QUORUM = NUM_INSTANCES + 1
+
+box.schema.user.grant('guest', 'replication')
+
+-- Setup a cluster with anonymous replica.
+test_run:cmd('create server replica_anon with rpl_master=default, script="replication/anon1.lua"')
+test_run:cmd('start server replica_anon')
+test_run:cmd('switch replica_anon')
+
+-- [RFC, Asynchronous replication] successful transaction applied on async
+-- replica.
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+test_run:switch('default')
+box.space.sync:insert{1} -- success
+box.space.sync:insert{2} -- success
+box.space.sync:insert{3} -- success
+test_run:cmd('switch replica_anon')
+box.space.sync:select{} -- 1, 2, 3
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- [RFC, Asynchronous replication] failed transaction rolled back on async
+-- replica.
+-- Testcase setup.
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+test_run:switch('default')
+box.space.sync:insert{1} -- failure
+test_run:cmd('switch replica_anon')
+box.space.sync:select{} -- none
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES}
+box.space.sync:insert{1} -- success
+test_run:cmd('switch replica_anon')
+box.space.sync:select{} -- 1
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Teardown.
+test_run:switch('default')
+test_run:cmd('stop server replica_anon')
+test_run:cmd('delete server replica_anon')
+box.schema.user.revoke('guest', 'replication')
+box.cfg{                                                                        \
+    replication_synchro_quorum = orig_synchro_quorum,                           \
+    replication_synchro_timeout = orig_synchro_timeout,                         \
+}
+test_run:cleanup_cluster()
-- 
2.26.2

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Tarantool-patches] [PATCH 4/4] replication: add tests for sync replication with snapshots
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (23 preceding siblings ...)
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 3/4] replication: add tests for sync replication with anon replica sergeyb
@ 2020-07-02 21:13   ` sergeyb
  2020-07-02 22:46     ` Sergey Bronnikov
                       ` (2 more replies)
  2020-07-06 23:31   ` [Tarantool-patches] [PATCH] Add new error injection constant ERRINJ_SYNC_TIMEOUT Vladislav Shpilevoy
                     ` (2 subsequent siblings)
  27 siblings, 3 replies; 68+ messages in thread
From: sergeyb @ 2020-07-02 21:13 UTC (permalink / raw)
  To: tarantool-patches, v.shpilevoy, sergepetrenko, gorcunov, lvasiliev

From: Sergey Bronnikov <sergeyb@tarantool.org>

Part of #5055
---
 test/replication/qsync_snapshots.result   | 362 ++++++++++++++++++++++
 test/replication/qsync_snapshots.test.lua | 132 ++++++++
 2 files changed, 494 insertions(+)
 create mode 100644 test/replication/qsync_snapshots.result
 create mode 100644 test/replication/qsync_snapshots.test.lua

diff --git a/test/replication/qsync_snapshots.result b/test/replication/qsync_snapshots.result
new file mode 100644
index 000000000..db98f87fd
--- /dev/null
+++ b/test/replication/qsync_snapshots.result
@@ -0,0 +1,362 @@
+-- test-run result file version 2
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+
+NUM_INSTANCES = 2
+ | ---
+ | ...
+BROKEN_QUORUM = NUM_INSTANCES + 1
+ | ---
+ | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+
+-- Setup an async cluster with two instances.
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+-- [RFC, Snapshot generation] all txns confirmed, then snapshot on master,
+-- expected success.
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+-- Testcase cleanup.
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- [RFC, Snapshot generation] all txns confirmed, then snapshot on replica,
+-- expected success.
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- [RFC, Snapshot generation] rolled back operations are not snapshotted
+-- Testcase setup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+box.space.sync:insert{2}
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- [RFC, Snapshot generation] snapshot started on master, then rollback
+-- arrived, expected snapshot abort
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+_ = fiber.create(function()
+    box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=2}
+    box.space.sync:insert{2}
+end);
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+box.snapshot() -- abort
+ | ---
+ | - error: A rollback for a synchronous transaction is received
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- [RFC, Snapshot generation] snapshot started on replica, then rollback
+-- arrived, expected snapshot abort
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+ | ---
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+-- Testcase body.
+box.space.sync:insert{1}
+ | ---
+ | - [1]
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd("setopt delimiter ';'")
+ | ---
+ | - true
+ | ...
+_ = fiber.create(function()
+    box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=2}
+    box.space.sync:insert{2}
+end);
+ | ---
+ | ...
+test_run:cmd("setopt delimiter ''");
+ | ---
+ | - true
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.snapshot() -- abort
+ | ---
+ | - error: A rollback for a synchronous transaction is received
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:select{} -- 1
+ | ---
+ | - - [1]
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- Teardown.
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+test_run:cleanup_cluster()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
+box.cfg{                                                                        \
+    replication_synchro_quorum = orig_synchro_quorum,                           \
+    replication_synchro_timeout = orig_synchro_timeout,                         \
+}
+ | ---
+ | ...
diff --git a/test/replication/qsync_snapshots.test.lua b/test/replication/qsync_snapshots.test.lua
new file mode 100644
index 000000000..b5990bce7
--- /dev/null
+++ b/test/replication/qsync_snapshots.test.lua
@@ -0,0 +1,132 @@
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
+fiber = require('fiber')
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+
+NUM_INSTANCES = 2
+BROKEN_QUORUM = NUM_INSTANCES + 1
+
+box.schema.user.grant('guest', 'replication')
+
+-- Setup an async cluster with two instances.
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+-- [RFC, Snapshot generation] all txns confirmed, then snapshot on master,
+-- expected success.
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+box.space.sync:select{} -- 1
+box.snapshot()
+box.space.sync:select{} -- 1
+-- Testcase cleanup.
+box.space.sync:drop()
+
+-- [RFC, Snapshot generation] all txns confirmed, then snapshot on replica,
+-- expected success.
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+box.space.sync:select{} -- 1
+test_run:switch('replica')
+box.space.sync:select{} -- 1
+box.snapshot()
+box.space.sync:select{} -- 1
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- [RFC, Snapshot generation] rolled back operations are not snapshotted.
+-- Testcase setup.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+box.space.sync:select{} -- 1
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=0.1}
+box.space.sync:insert{2}
+box.snapshot()
+box.space.sync:select{} -- 1
+test_run:switch('replica')
+box.space.sync:select{} -- 1
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- [RFC, Snapshot generation] snapshot started on master, then rollback
+-- arrived, expected snapshot abort.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+box.space.sync:select{} -- 1
+test_run:switch('default')
+test_run:cmd("setopt delimiter ';'")
+_ = fiber.create(function()
+    box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=2}
+    box.space.sync:insert{2}
+end);
+test_run:cmd("setopt delimiter ''");
+box.snapshot() -- abort
+box.space.sync:select{} -- 1
+test_run:switch('replica')
+box.space.sync:select{} -- 1
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- [RFC, Snapshot generation] snapshot started on replica, then rollback
+-- arrived, expected snapshot abort.
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+-- Testcase body.
+box.space.sync:insert{1}
+box.space.sync:select{} -- 1
+test_run:switch('replica')
+box.space.sync:select{} -- 1
+test_run:switch('default')
+test_run:cmd("setopt delimiter ';'")
+_ = fiber.create(function()
+    box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=2}
+    box.space.sync:insert{2}
+end);
+test_run:cmd("setopt delimiter ''");
+test_run:switch('replica')
+box.snapshot() -- abort
+box.space.sync:select{} -- 1
+test_run:switch('default')
+box.space.sync:select{} -- 1
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Teardown.
+test_run:cmd('switch default')
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+test_run:cleanup_cluster()
+box.schema.user.revoke('guest', 'replication')
+box.cfg{                                                                        \
+    replication_synchro_quorum = orig_synchro_quorum,                           \
+    replication_synchro_timeout = orig_synchro_timeout,                         \
+}
-- 
2.26.2

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication sergeyb
@ 2020-07-02 22:46     ` Sergey Bronnikov
  2020-07-02 23:20     ` Vladislav Shpilevoy
  2020-07-06 23:31     ` Vladislav Shpilevoy
  2 siblings, 0 replies; 68+ messages in thread
From: Sergey Bronnikov @ 2020-07-02 22:46 UTC (permalink / raw)
  To: tarantool-patches, v.shpilevoy, sergepetrenko, gorcunov, lvasiliev

Forgot to fix unreliable piece of code in a test that makes test flaky.
Changes with fix are below, branch contains updated commit.

diff --git a/test/replication/qsync_advanced.result b/test/replication/qsync_advanced.result
index fa94c8339..53722c3f9 100644
--- a/test/replication/qsync_advanced.result
+++ b/test/replication/qsync_advanced.result
@@ -663,7 +663,7 @@ box.space.sync:drop()
  | ---
  | ...
 
--- check behaviour with failed write to WAL on master (ERRINJ_WAL_IO)
+-- Check behaviour with failed write to WAL on master (ERRINJ_WAL_IO).
 -- Testcase setup.
 test_run:switch('default')
  | ---
@@ -849,15 +849,8 @@ test_run:switch('default')
  | - true
  | ...
 -- Enable synchronous mode.
-s = box.space._space:get(box.space.sync.id)
- | ---
- | ...
-new_s = s:update({{'=', 6, {is_sync=true}}})
- | ---
- | ...
-box.space._space:replace(new_s)
+disable_sync_mode()
  | ---
- | - [523, 1, 'sync', 'vinyl', 0, {'is_sync': true}, []]
  | ...
 -- Space is in sync mode now.
 box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
@@ -867,41 +860,28 @@ box.space.sync:insert{2} -- success
  | ---
  | - [2]
  | ...
-box.space.sync:insert{3} -- success
- | ---
- | - [3]
- | ...
 box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
  | ---
  | ...
-box.space.sync:insert{4} -- failure
- | ---
- | - error: Quorum collection for a synchronous transaction is timed out
- | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
- | ---
- | ...
-box.space.sync:insert{5} -- success
+box.space.sync:insert{3} -- success
  | ---
- | - [5]
+ | - [3]
  | ...
-box.space.sync:select{} -- 1, 2, 3, 5
+box.space.sync:select{} -- 1, 2, 3
  | ---
  | - - [1]
  |   - [2]
  |   - [3]
- |   - [5]
  | ...
 test_run:cmd('switch replica')
  | ---
  | - true
  | ...
-box.space.sync:select{} -- 1, 2, 3, 5
+box.space.sync:select{} -- 1, 2, 3
  | ---
  | - - [1]
  |   - [2]
  |   - [3]
- |   - [5]
  | ...
 -- Testcase cleanup.
 test_run:switch('default')
diff --git a/test/replication/qsync_advanced.test.lua b/test/replication/qsync_advanced.test.lua
index 270fd494d..9633ceb6c 100644
--- a/test/replication/qsync_advanced.test.lua
+++ b/test/replication/qsync_advanced.test.lua
@@ -307,20 +307,15 @@ test_run:cmd('switch replica')
 box.space.sync:select{} -- 1
 test_run:switch('default')
 -- Enable synchronous mode.
-s = box.space._space:get(box.space.sync.id)
-new_s = s:update({{'=', 6, {is_sync=true}}})
-box.space._space:replace(new_s)
+disable_sync_mode()
 -- Space is in sync mode now.
 box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
 box.space.sync:insert{2} -- success
-box.space.sync:insert{3} -- success
 box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
-box.space.sync:insert{4} -- failure
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
-box.space.sync:insert{5} -- success
-box.space.sync:select{} -- 1, 2, 3, 5
+box.space.sync:insert{3} -- success
+box.space.sync:select{} -- 1, 2, 3
 test_run:cmd('switch replica')
-box.space.sync:select{} -- 1, 2, 3, 5
+box.space.sync:select{} -- 1, 2, 3
 -- Testcase cleanup.
 test_run:switch('default')
 box.space.sync:drop()

On 00:13 Fri 03 Jul , sergeyb@tarantool.org wrote:
> From: Sergey Bronnikov <sergeyb@tarantool.org>
> 
> Part of #5055
> ---
>  test/replication/qsync_advanced.result   | 939 +++++++++++++++++++++++
>  test/replication/qsync_advanced.test.lua | 337 ++++++++
>  2 files changed, 1276 insertions(+)
>  create mode 100644 test/replication/qsync_advanced.result
>  create mode 100644 test/replication/qsync_advanced.test.lua

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 4/4] replication: add tests for sync replication with snapshots
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 4/4] replication: add tests for sync replication with snapshots sergeyb
@ 2020-07-02 22:46     ` Sergey Bronnikov
  2020-07-02 23:20     ` Vladislav Shpilevoy
  2020-07-06 23:31     ` Vladislav Shpilevoy
  2 siblings, 0 replies; 68+ messages in thread
From: Sergey Bronnikov @ 2020-07-02 22:46 UTC (permalink / raw)
  To: tarantool-patches, v.shpilevoy, sergepetrenko, gorcunov, lvasiliev

Unfortunately .result file is not consistent with test source code and
due to this test failed. I have updated it to make test green. Applied
changes below, branch contains updated commit.

diff --git a/test/replication/qsync_snapshots.result b/test/replication/qsync_snapshots.result
index db98f87fd..61cb7164b 100644
--- a/test/replication/qsync_snapshots.result
+++ b/test/replication/qsync_snapshots.result
@@ -129,7 +129,7 @@ box.space.sync:drop()
  | ---
  | ...
 
--- [RFC, Snapshot generation] rolled back operations are not snapshotted
+-- [RFC, Snapshot generation] rolled back operations are not snapshotted.
 -- Testcase setup.
 test_run:switch('default')
  | ---
@@ -190,7 +190,7 @@ box.space.sync:drop()
  | ...
 
 -- [RFC, Snapshot generation] snapshot started on master, then rollback
--- arrived, expected snapshot abort
+-- arrived, expected snapshot abort.
 test_run:switch('default')
  | ---
  | - true
@@ -257,7 +257,7 @@ box.space.sync:drop()
  | ...
 
 -- [RFC, Snapshot generation] snapshot started on replica, then rollback
--- arrived, expected snapshot abort
+-- arrived, expected snapshot abort.
 test_run:switch('default')
  | ---
  | - true

Sergey

On 00:13 Fri 03 Jul , sergeyb@tarantool.org wrote:
> From: Sergey Bronnikov <sergeyb@tarantool.org>
> 
> Part of #5055
> ---
>  test/replication/qsync_snapshots.result   | 362 ++++++++++++++++++++++
>  test/replication/qsync_snapshots.test.lua | 132 ++++++++
>  2 files changed, 494 insertions(+)
>  create mode 100644 test/replication/qsync_snapshots.result
>  create mode 100644 test/replication/qsync_snapshots.test.lua

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication sergeyb
  2020-07-02 22:46     ` Sergey Bronnikov
@ 2020-07-02 23:20     ` Vladislav Shpilevoy
  2020-07-06 12:30       ` Sergey Bronnikov
  2020-07-06 23:31     ` Vladislav Shpilevoy
  2 siblings, 1 reply; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-02 23:20 UTC (permalink / raw)
  To: sergeyb, tarantool-patches, sergepetrenko, gorcunov, lvasiliev

I didn't review it properly. Just a local fail:

[001] replication/qsync_advanced.test.lua             memtx           [ fail ]
[001] 
[001] Test failed! Result content mismatch:
[001] --- replication/qsync_advanced.result	Fri Jul  3 01:14:24 2020
[001] +++ replication/qsync_advanced.reject	Fri Jul  3 01:17:49 2020
[001] @@ -538,7 +538,7 @@
[001]  -- Disable synchronous mode.
[001]  disable_sync_mode()
[001]   | ---
[001] - | - error: A rollback for a synchronous transaction is received
[001] + | - error: Quorum collection for a synchronous transaction is timed out
[001]   | ...
[001]  -- Space is in async mode now.
[001]  box.space.sync:insert{3} -- async operation must wait sync one

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 4/4] replication: add tests for sync replication with snapshots
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 4/4] replication: add tests for sync replication with snapshots sergeyb
  2020-07-02 22:46     ` Sergey Bronnikov
@ 2020-07-02 23:20     ` Vladislav Shpilevoy
  2020-07-06 23:31     ` Vladislav Shpilevoy
  2 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-02 23:20 UTC (permalink / raw)
  To: sergeyb, tarantool-patches, sergepetrenko, gorcunov, lvasiliev

I didn't review it properly. Just a local fail:

[002] replication/qsync_snapshots.test.lua            memtx           [ fail ]
[002] 
[002] Test failed! Result content mismatch:
[002] --- replication/qsync_snapshots.result	Fri Jul  3 01:14:24 2020
[002] +++ replication/qsync_snapshots.reject	Fri Jul  3 01:17:11 2020
[002] @@ -233,7 +233,7 @@
[002]   | ...
[002]  box.snapshot() -- abort
[002]   | ---
[002] - | - error: A rollback for a synchronous transaction is received
[002] + | - error: Quorum collection for a synchronous transaction is timed out
[002]   | ...
[002]  box.space.sync:select{} -- 1
[002]   | ---
[002] 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 02/19] replication: introduce replication_synchro_* cfg options
  2020-07-02  8:29     ` Serge Petrenko
@ 2020-07-02 23:36       ` Vladislav Shpilevoy
  0 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-02 23:36 UTC (permalink / raw)
  To: Serge Petrenko, tarantool-patches

Thanks for the review!

>> diff --git a/src/box/box.cc b/src/box/box.cc
>> index 871b0d976..0821ea0a3 100644
>> --- a/src/box/box.cc
>> +++ b/src/box/box.cc
>> @@ -476,6 +476,31 @@ box_check_replication_sync_lag(void)
>>       return lag;
>>   }
>>   +static int
>> +box_check_replication_synchro_quorum(void)
>> +{
>> +    int quorum = cfg_geti("replication_synchro_quorum");
>> +    if (quorum <= 0 || quorum > VCLOCK_MAX) {
> 
> It should be `quorum >= VCLOCK_MAX`, because you can't have VCLOCK_MAX (32)
> 
> instances in a cluster, only 31. Id 0 is used by anonymous replicas.

Indeed, done and squashed.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 15/19] applier: send heartbeat not only on commit, but on any write
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 15/19] applier: send heartbeat not only on commit, but on any write Vladislav Shpilevoy
  2020-07-01 23:55     ` Vladislav Shpilevoy
@ 2020-07-03 12:23     ` Serge Petrenko
  1 sibling, 0 replies; 68+ messages in thread
From: Serge Petrenko @ 2020-07-03 12:23 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches


30.06.2020 02:15, Vladislav Shpilevoy пишет:
> Concept of 'commit' becomes not 100% matching WAL write event,
> when synchro replication comes.
>
> And yet applier relied on commit event when sent periodic
> hearbeats to tell the master the replica's new vclock.
>
> The patch makes applier send heartbeats on any write event. Even
> if it was not commit. For example, when a sync transaction's
> data was written, and the replica needs to tell the master ACK
> using the heartbeat.
>
> Closes #5100
> ---
>   src/box/applier.cc                            | 25 +++++++-
>   .../sync_replication_sanity.result            | 59 ++++++++++++++++++-
>   .../sync_replication_sanity.test.lua          | 32 +++++++++-
>   3 files changed, 107 insertions(+), 9 deletions(-)
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 635a9849c..a9baf0d69 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -755,6 +755,11 @@ applier_txn_rollback_cb(struct trigger *trigger, void *event)
>   {
>   	(void) trigger;
>   	struct txn *txn = (struct txn *) event;
> +	/*
> +	 * Let the txn module free the transaction object. It is
> +	 * not needed for anything else.
> +	 */
> +	txn->fiber = NULL;
>   	/*
>   	 * Synchronous transaction rollback due to receiving a
>   	 * ROLLBACK entry is a normal event and requires no
> @@ -791,6 +796,14 @@ static int
>   applier_txn_commit_cb(struct trigger *trigger, void *event)
>   {
>   	(void) trigger;
> +	struct txn *txn = (struct txn *)event;
> +	assert(txn->fiber != NULL);
> +	assert(strncmp(txn->fiber->name, "applierw", 8) == 0);
> +	/*
> +	 * Let the txn module free the transaction object. It is
> +	 * not needed for anything else.
> +	 */
> +	txn->fiber = NULL;
>   	/* Broadcast the commit event across all appliers. */
>   	trigger_run(&replicaset.applier.on_commit, event);
>   	return 0;
> @@ -802,7 +815,7 @@ applier_txn_commit_cb(struct trigger *trigger, void *event)
>    * Return 0 for success or -1 in case of an error.
>    */
>   static int
> -applier_apply_tx(struct stailq *rows)
> +applier_apply_tx(struct stailq *rows, struct fiber *writer)
>   {
>   	struct xrow_header *first_row = &stailq_first_entry(rows,
>   					struct applier_tx_row, next)->row;
> @@ -894,7 +907,13 @@ applier_apply_tx(struct stailq *rows)
>   
>   	trigger_create(on_commit, applier_txn_commit_cb, NULL, NULL);
>   	txn_on_commit(txn, on_commit);
> -
> +	/*
> +	 * Wakeup the writer fiber after the transaction is
> +	 * completed. To send ACK to the master. In case of async
> +	 * transaction it is the same as commit event. In case of
> +	 * sync it happens after the data is written to WAL.
> +	 */
> +	txn->fiber = writer;
>   	if (txn_commit_async(txn) < 0)
>   		goto fail;
>   
> @@ -1092,7 +1111,7 @@ applier_subscribe(struct applier *applier)
>   		if (stailq_first_entry(&rows, struct applier_tx_row,
>   				       next)->row.lsn == 0)
>   			fiber_wakeup(applier->writer);
> -		else if (applier_apply_tx(&rows) != 0)
> +		else if (applier_apply_tx(&rows, applier->writer) != 0)
>   			diag_raise();
>   
>   		if (ibuf_used(ibuf) == 0)
> diff --git a/test/replication/sync_replication_sanity.result b/test/replication/sync_replication_sanity.result
> index 4b9823d77..8b37ba6f5 100644
> --- a/test/replication/sync_replication_sanity.result
> +++ b/test/replication/sync_replication_sanity.result
> @@ -90,10 +90,10 @@ box.schema.user.grant('guest', 'replication')
>    | ---
>    | ...
>   -- Set up synchronous replication options.
> -quorum = box.cfg.replication_synchro_quorum
> +old_synchro_quorum = box.cfg.replication_synchro_quorum
>    | ---
>    | ...
> -timeout = box.cfg.replication_synchro_timeout
> +old_synchro_timeout = box.cfg.replication_synchro_timeout
>    | ---
>    | ...
>   box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
> @@ -178,13 +178,63 @@ box.space.sync:select{}
>    |   - [3]
>    | ...
>   
> +--
> +-- gh-5100: replica should send ACKs for sync transactions after
> +-- WAL write immediately, not waiting for replication timeout or
> +-- a CONFIRM.
> +--
> +box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
> + | ---
> + | ...
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +old_timeout = box.cfg.replication_timeout
> + | ---
> + | ...
> +box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
> + | ---
> + | ...
> +-- Commit something non-sync. So as applier writer fiber would
> +-- flush the pending heartbeat and go to sleep with the new huge
> +-- replication timeout.
> +s = box.schema.create_space('test')
> + | ---
> + | ...
> +pk = s:create_index('pk')
> + | ---
> + | ...
> +s:replace{1}
> + | ---
> + | - [1]
> + | ...
> +-- Now commit something sync. It should return immediately even
> +-- though the replication timeout is huge.
> +box.space.sync:replace{4}
> + | ---
> + | - [4]
> + | ...
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.space.sync:select{4}
> + | ---
> + | - - [4]
> + | ...
> +
>   -- Cleanup.
>   test_run:cmd('switch default')
>    | ---
>    | - true
>    | ...
>   
> -box.cfg{replication_synchro_quorum=quorum, replication_synchro_timeout=timeout}
> +box.cfg{                                                                        \
> +    replication_synchro_quorum = old_synchro_quorum,                            \
> +    replication_synchro_timeout = old_synchro_timeout,                          \
> +    replication_timeout = old_timeout,                                          \
> +}
>    | ---
>    | ...
>   test_run:cmd('stop server replica')
> @@ -195,6 +245,9 @@ test_run:cmd('delete server replica')
>    | ---
>    | - true
>    | ...
> +box.space.test:drop()
> + | ---
> + | ...
>   box.space.sync:drop()
>    | ---
>    | ...
> diff --git a/test/replication/sync_replication_sanity.test.lua b/test/replication/sync_replication_sanity.test.lua
> index 8715a4600..b0326fd4b 100644
> --- a/test/replication/sync_replication_sanity.test.lua
> +++ b/test/replication/sync_replication_sanity.test.lua
> @@ -38,8 +38,8 @@ engine = test_run:get_cfg('engine')
>   
>   box.schema.user.grant('guest', 'replication')
>   -- Set up synchronous replication options.
> -quorum = box.cfg.replication_synchro_quorum
> -timeout = box.cfg.replication_synchro_timeout
> +old_synchro_quorum = box.cfg.replication_synchro_quorum
> +old_synchro_timeout = box.cfg.replication_synchro_timeout
>   box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
>   
>   test_run:cmd('create server replica with rpl_master=default,\
> @@ -71,11 +71,37 @@ box.space.sync:select{}
>   test_run:cmd('restart server replica')
>   box.space.sync:select{}
>   
> +--
> +-- gh-5100: replica should send ACKs for sync transactions after
> +-- WAL write immediately, not waiting for replication timeout or
> +-- a CONFIRM.
> +--
> +box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
> +test_run:switch('default')
> +old_timeout = box.cfg.replication_timeout
> +box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
> +-- Commit something non-sync. So as applier writer fiber would
> +-- flush the pending heartbeat and go to sleep with the new huge
> +-- replication timeout.
> +s = box.schema.create_space('test')
> +pk = s:create_index('pk')
> +s:replace{1}
> +-- Now commit something sync. It should return immediately even
> +-- though the replication timeout is huge.
> +box.space.sync:replace{4}
> +test_run:switch('replica')
> +box.space.sync:select{4}
> +
>   -- Cleanup.
>   test_run:cmd('switch default')
>   
> -box.cfg{replication_synchro_quorum=quorum, replication_synchro_timeout=timeout}
> +box.cfg{                                                                        \
> +    replication_synchro_quorum = old_synchro_quorum,                            \
> +    replication_synchro_timeout = old_synchro_timeout,                          \
> +    replication_timeout = old_timeout,                                          \
> +}
>   test_run:cmd('stop server replica')
>   test_run:cmd('delete server replica')
> +box.space.test:drop()
>   box.space.sync:drop()
>   box.schema.user.revoke('guest', 'replication')

Thanks! LGTM.

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 19/19] replication: block async transactions when not empty limbo
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 19/19] replication: block async transactions when not empty limbo Vladislav Shpilevoy
  2020-07-01 17:12     ` Sergey Ostanevich
@ 2020-07-03 12:28     ` Serge Petrenko
  1 sibling, 0 replies; 68+ messages in thread
From: Serge Petrenko @ 2020-07-03 12:28 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches


30.06.2020 02:15, Vladislav Shpilevoy пишет:
> When there is a not committed synchronous transaction, any attempt
> to commit a next transaction should be suspended, even if it is an
> async transaction.
>
> This restriction comes from the theoretically possible dependency
> of what is written in the async transactions on what was written
> in the previous sync transactions.
>
> For that there is a new txn flag - TXN_WAIT_SYNC. Previously the
> only synchro replication flag was TXN_WAIT_ACK. And now a
> transaction can be sync, but not wait for ACKs.
>
> In particular, if a transaction:
>
> - Is synchronous, the it has TXN_WAIT_SYNC (it is sync), and
>    TXN_WAIT_ACK (need to collect ACKs, or get a CONFIRM);
>
> - Is asynchronous, and the limbo was empty and the moment of
>    commit, the it does not have any of these flags and committed
>    like earlier;
>
> - Is asynchronous, and the limbo was not empty and the moment of
>    commit. Then it will have only TXN_WAIT_SYNC. So it will be
>    finished right after all the previous sync transactions are
>    done. Note: *without waiting for ACKs* - the transaction is
>    still asynchronous in a sense that it is don't need to wait for
>    quorum replication.
>
> Follow-up #4845

Thanks for the patch! LGTM.

> ---
>   src/box/applier.cc                            |  8 ++
>   src/box/txn.c                                 | 16 ++--
>   src/box/txn.h                                 |  7 ++
>   src/box/txn_limbo.c                           | 49 +++++++++---
>   .../sync_replication_sanity.result            | 75 +++++++++++++++++++
>   .../sync_replication_sanity.test.lua          | 26 +++++++
>   test/unit/snap_quorum_delay.cc                |  6 +-
>   7 files changed, 172 insertions(+), 15 deletions(-)
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 7e63dc544..7e70211b7 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -280,6 +280,14 @@ process_confirm_rollback(struct request *request, bool is_confirm)
>   			 txn_limbo.instance_id);
>   		return -1;
>   	}
> +	assert(txn->n_applier_rows == 0);
> +	/*
> +	 * This is not really a transaction. It just uses txn API
> +	 * to put the data into WAL. And obviously it should not
> +	 * go to the limbo and block on the very same sync
> +	 * transaction which it tries to confirm now.
> +	 */
> +	txn_set_flag(txn, TXN_FORCE_ASYNC);
>   
>   	if (txn_begin_stmt(txn, NULL) != 0)
>   		return -1;
> diff --git a/src/box/txn.c b/src/box/txn.c
> index 37955752a..bc2bb8e11 100644
> --- a/src/box/txn.c
> +++ b/src/box/txn.c
> @@ -442,7 +442,7 @@ txn_complete(struct txn *txn)
>   			engine_rollback(txn->engine, txn);
>   		if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
>   			txn_run_rollback_triggers(txn, &txn->on_rollback);
> -	} else if (!txn_has_flag(txn, TXN_WAIT_ACK)) {
> +	} else if (!txn_has_flag(txn, TXN_WAIT_SYNC)) {
>   		/* Commit the transaction. */
>   		if (txn->engine != NULL)
>   			engine_commit(txn->engine, txn);
> @@ -552,8 +552,14 @@ txn_journal_entry_new(struct txn *txn)
>   	 * space can't be synchronous. So if there is at least one
>   	 * synchronous space, the transaction is not local.
>   	 */
> -	if (is_sync && !txn_has_flag(txn, TXN_FORCE_ASYNC))
> -		txn_set_flag(txn, TXN_WAIT_ACK);
> +	if (!txn_has_flag(txn, TXN_FORCE_ASYNC)) {
> +		if (is_sync) {
> +			txn_set_flag(txn, TXN_WAIT_SYNC);
> +			txn_set_flag(txn, TXN_WAIT_ACK);
> +		} else if (!txn_limbo_is_empty(&txn_limbo)) {
> +			txn_set_flag(txn, TXN_WAIT_SYNC);
> +		}
> +	}
>   
>   	assert(remote_row == req->rows + txn->n_applier_rows);
>   	assert(local_row == remote_row + txn->n_new_rows);
> @@ -662,7 +668,7 @@ txn_commit_async(struct txn *txn)
>   		return -1;
>   	}
>   
> -	bool is_sync = txn_has_flag(txn, TXN_WAIT_ACK);
> +	bool is_sync = txn_has_flag(txn, TXN_WAIT_SYNC);
>   	struct txn_limbo_entry *limbo_entry;
>   	if (is_sync) {
>   		/*
> @@ -737,7 +743,7 @@ txn_commit(struct txn *txn)
>   		return -1;
>   	}
>   
> -	bool is_sync = txn_has_flag(txn, TXN_WAIT_ACK);
> +	bool is_sync = txn_has_flag(txn, TXN_WAIT_SYNC);
>   	if (is_sync) {
>   		/*
>   		 * Remote rows, if any, come before local rows, so
> diff --git a/src/box/txn.h b/src/box/txn.h
> index c631d7033..c484fcb56 100644
> --- a/src/box/txn.h
> +++ b/src/box/txn.h
> @@ -66,11 +66,18 @@ enum txn_flag {
>   	TXN_CAN_YIELD,
>   	/** on_commit and/or on_rollback list is not empty. */
>   	TXN_HAS_TRIGGERS,
> +	/**
> +	 * A transaction is either synchronous itself and needs to
> +	 * be synced with replicas, or it is async, but is blocked
> +	 * by a not yet finished synchronous transaction.
> +	 */
> +	TXN_WAIT_SYNC,
>   	/**
>   	 * Transaction, touched sync spaces, enters 'waiting for
>   	 * acks' state before commit. In this state it waits until
>   	 * it is replicated onto a quorum of replicas, and only
>   	 * then finishes commit and returns success to a user.
> +	 * TXN_WAIT_SYNC is always set, if TXN_WAIT_ACK is set.
>   	 */
>   	TXN_WAIT_ACK,
>   	/**
> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index fbe4dcecf..bfb404e8e 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -47,7 +47,7 @@ txn_limbo_create(struct txn_limbo *limbo)
>   struct txn_limbo_entry *
>   txn_limbo_append(struct txn_limbo *limbo, uint32_t id, struct txn *txn)
>   {
> -	assert(txn_has_flag(txn, TXN_WAIT_ACK));
> +	assert(txn_has_flag(txn, TXN_WAIT_SYNC));
>   	if (id == 0)
>   		id = instance_id;
>   	if (limbo->instance_id != id) {
> @@ -143,7 +143,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
>   	struct txn *txn = entry->txn;
>   	assert(entry->lsn > 0);
>   	assert(!txn_has_flag(txn, TXN_IS_DONE));
> -	assert(txn_has_flag(txn, TXN_WAIT_ACK));
> +	assert(txn_has_flag(txn, TXN_WAIT_SYNC));
>   	if (txn_limbo_check_complete(limbo, entry)) {
>   		txn_limbo_remove(limbo, entry);
>   		return 0;
> @@ -160,6 +160,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
>   			e->txn->signature = TXN_SIGNATURE_QUORUM_TIMEOUT;
>   			txn_limbo_pop(limbo, e);
>   			txn_clear_flag(e->txn, TXN_WAIT_ACK);
> +			txn_clear_flag(e->txn, TXN_WAIT_SYNC);
>   			txn_complete(e->txn);
>   			if (e == entry)
>   				break;
> @@ -179,6 +180,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
>   	}
>   	txn_limbo_remove(limbo, entry);
>   	txn_clear_flag(txn, TXN_WAIT_ACK);
> +	txn_clear_flag(txn, TXN_WAIT_SYNC);
>   	return 0;
>   }
>   
> @@ -209,6 +211,13 @@ txn_limbo_write_confirm_rollback(struct txn_limbo *limbo,
>   	struct txn *txn = txn_begin();
>   	if (txn == NULL)
>   		return -1;
> +	/*
> +	 * This is not really a transaction. It just uses txn API
> +	 * to put the data into WAL. And obviously it should not
> +	 * go to the limbo and block on the very same sync
> +	 * transaction which it tries to confirm now.
> +	 */
> +	txn_set_flag(txn, TXN_FORCE_ASYNC);
>   
>   	if (txn_begin_stmt(txn, NULL) != 0)
>   		goto rollback;
> @@ -238,11 +247,21 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
>   	assert(limbo->instance_id != REPLICA_ID_NIL);
>   	struct txn_limbo_entry *e, *tmp;
>   	rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
> -		if (e->lsn > lsn)
> +		/*
> +		 * Confirm a transaction if
> +		 * - it is a sync transaction covered by the
> +		 *   confirmation LSN;
> +		 * - it is an async transaction, and it is the
> +		 *   last in the queue. So it does not depend on
> +		 *   a not finished sync transaction anymore and
> +		 *   can be confirmed too.
> +		 */
> +		if (e->lsn > lsn && txn_has_flag(e->txn, TXN_WAIT_ACK))
>   			break;
>   		e->is_commit = true;
>   		txn_limbo_remove(limbo, e);
>   		txn_clear_flag(e->txn, TXN_WAIT_ACK);
> +		txn_clear_flag(e->txn, TXN_WAIT_SYNC);
>   		/*
>   		 * If  txn_complete_async() was already called,
>   		 * finish tx processing. Otherwise just clear the
> @@ -277,6 +296,7 @@ txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
>   		e->is_rollback = true;
>   		txn_limbo_pop(limbo, e);
>   		txn_clear_flag(e->txn, TXN_WAIT_ACK);
> +		txn_clear_flag(e->txn, TXN_WAIT_SYNC);
>   		if (e->txn->signature >= 0) {
>   			/* Rollback the transaction. */
>   			e->txn->signature = TXN_SIGNATURE_SYNC_ROLLBACK;
> @@ -307,15 +327,26 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
>   	struct txn_limbo_entry *e;
>   	struct txn_limbo_entry *last_quorum = NULL;
>   	rlist_foreach_entry(e, &limbo->queue, in_queue) {
> -		if (e->lsn <= prev_lsn)
> -			continue;
>   		if (e->lsn > lsn)
>   			break;
> -		if (++e->ack_count >= replication_synchro_quorum) {
> -			e->is_commit = true;
> -			last_quorum = e;
> -		}
> +		if (e->lsn <= prev_lsn)
> +			continue;
>   		assert(e->ack_count <= VCLOCK_MAX);
> +		/*
> +		 * Sync transactions need to collect acks. Async
> +		 * transactions are automatically committed right
> +		 * after all the previous sync transactions are.
> +		 */
> +		if (txn_has_flag(e->txn, TXN_WAIT_ACK)) {
> +			if (++e->ack_count < replication_synchro_quorum)
> +				continue;
> +		} else {
> +			assert(txn_has_flag(e->txn, TXN_WAIT_SYNC));
> +			if (last_quorum == NULL)
> +				continue;
> +		}
> +		e->is_commit = true;
> +		last_quorum = e;
>   	}
>   	if (last_quorum != NULL) {
>   		if (txn_limbo_write_confirm(limbo, last_quorum) != 0) {
> diff --git a/test/replication/sync_replication_sanity.result b/test/replication/sync_replication_sanity.result
> index 8b37ba6f5..f713d4b08 100644
> --- a/test/replication/sync_replication_sanity.result
> +++ b/test/replication/sync_replication_sanity.result
> @@ -224,6 +224,81 @@ box.space.sync:select{4}
>    | - - [4]
>    | ...
>   
> +--
> +-- Async transactions should wait for existing sync transactions
> +-- finish.
> +--
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +-- Start 2 fibers, which will execute one right after the other
> +-- in the same event loop iteration.
> +f = fiber.create(box.space.sync.replace, box.space.sync, {5}) s:replace{5}
> + | ---
> + | ...
> +f:status()
> + | ---
> + | - dead
> + | ...
> +s:select{5}
> + | ---
> + | - - [5]
> + | ...
> +box.space.sync:select{5}
> + | ---
> + | - - [5]
> + | ...
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.space.test:select{5}
> + | ---
> + | - - [5]
> + | ...
> +box.space.sync:select{5}
> + | ---
> + | - - [5]
> + | ...
> +-- Ensure sync rollback will affect all pending async transactions
> +-- too.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_timeout = 0.001, replication_synchro_quorum = 3}
> + | ---
> + | ...
> +f = fiber.create(box.space.sync.replace, box.space.sync, {6}) s:replace{6}
> + | ---
> + | - error: Quorum collection for a synchronous transaction is timed out
> + | ...
> +f:status()
> + | ---
> + | - dead
> + | ...
> +s:select{6}
> + | ---
> + | - []
> + | ...
> +box.space.sync:select{6}
> + | ---
> + | - []
> + | ...
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.space.test:select{6}
> + | ---
> + | - []
> + | ...
> +box.space.sync:select{6}
> + | ---
> + | - []
> + | ...
> +
>   -- Cleanup.
>   test_run:cmd('switch default')
>    | ---
> diff --git a/test/replication/sync_replication_sanity.test.lua b/test/replication/sync_replication_sanity.test.lua
> index b0326fd4b..f84b6ee19 100644
> --- a/test/replication/sync_replication_sanity.test.lua
> +++ b/test/replication/sync_replication_sanity.test.lua
> @@ -92,6 +92,32 @@ box.space.sync:replace{4}
>   test_run:switch('replica')
>   box.space.sync:select{4}
>   
> +--
> +-- Async transactions should wait for existing sync transactions
> +-- finish.
> +--
> +test_run:switch('default')
> +-- Start 2 fibers, which will execute one right after the other
> +-- in the same event loop iteration.
> +f = fiber.create(box.space.sync.replace, box.space.sync, {5}) s:replace{5}
> +f:status()
> +s:select{5}
> +box.space.sync:select{5}
> +test_run:switch('replica')
> +box.space.test:select{5}
> +box.space.sync:select{5}
> +-- Ensure sync rollback will affect all pending async transactions
> +-- too.
> +test_run:switch('default')
> +box.cfg{replication_synchro_timeout = 0.001, replication_synchro_quorum = 3}
> +f = fiber.create(box.space.sync.replace, box.space.sync, {6}) s:replace{6}
> +f:status()
> +s:select{6}
> +box.space.sync:select{6}
> +test_run:switch('replica')
> +box.space.test:select{6}
> +box.space.sync:select{6}
> +
>   -- Cleanup.
>   test_run:cmd('switch default')
>   
> diff --git a/test/unit/snap_quorum_delay.cc b/test/unit/snap_quorum_delay.cc
> index 7a200673a..e6cf381bf 100644
> --- a/test/unit/snap_quorum_delay.cc
> +++ b/test/unit/snap_quorum_delay.cc
> @@ -97,8 +97,12 @@ txn_process_func(va_list ap)
>   	enum process_type process_type = (enum process_type)va_arg(ap, int);
>   	struct txn *txn = txn_begin();
>   	txn->fiber = fiber();
> -	/* Set the TXN_WAIT_ACK flag to simulate a sync transaction.*/
> +	/*
> +	 * Set the TXN_WAIT_ACK + SYNC flags to simulate a sync
> +	 * transaction.
> +	 */
>   	txn_set_flag(txn, TXN_WAIT_ACK);
> +	txn_set_flag(txn, TXN_WAIT_SYNC);
>   	/*
>   	 * The true way to push the transaction to limbo is to call
>   	 * txn_commit() for sync transaction. But, if txn_commit()

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 20/19] replication: add test for quorum 1
  2020-06-30 23:00   ` [Tarantool-patches] [PATCH v2 20/19] replication: add test for quorum 1 Vladislav Shpilevoy
@ 2020-07-03 12:32     ` Serge Petrenko
  0 siblings, 0 replies; 68+ messages in thread
From: Serge Petrenko @ 2020-07-03 12:32 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches


01.07.2020 02:00, Vladislav Shpilevoy пишет:
> When synchro quorum is 1, the final commit and confirmation write
> are done by the fiber created the transaction, right after WAL
> write. This case got special handling in the previous patches,
> and this commits adds a test for that.
>
> Closes #5123


Thanks for the patch! LGTM.

> ---
>   test/replication/qsync_basic.result    |  33 +++++++
>   test/replication/qsync_basic.test.lua  |  12 +++
>   test/replication/qsync_errinj.result   | 114 +++++++++++++++++++++++++
>   test/replication/qsync_errinj.test.lua |  45 ++++++++++
>   test/replication/suite.ini             |   2 +-
>   5 files changed, 205 insertions(+), 1 deletion(-)
>   create mode 100644 test/replication/qsync_errinj.result
>   create mode 100644 test/replication/qsync_errinj.test.lua
>
> diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
> index f713d4b08..cdecf00e8 100644
> --- a/test/replication/qsync_basic.result
> +++ b/test/replication/qsync_basic.result
> @@ -299,6 +299,39 @@ box.space.sync:select{6}
>    | - []
>    | ...
>   
> +--
> +-- gh-5123: quorum 1 still should write CONFIRM.
> +--
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_quorum = 1, replication_synchro_timeout = 5}
> + | ---
> + | ...
> +oldlsn = box.info.lsn
> + | ---
> + | ...
> +box.space.sync:replace{7}
> + | ---
> + | - [7]
> + | ...
> +newlsn = box.info.lsn
> + | ---
> + | ...
> +assert(newlsn >= oldlsn + 2)
> + | ---
> + | - true
> + | ...
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.space.sync:select{7}
> + | ---
> + | - - [7]
> + | ...
> +
>   -- Cleanup.
>   test_run:cmd('switch default')
>    | ---
> diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
> index f84b6ee19..361f22bc3 100644
> --- a/test/replication/qsync_basic.test.lua
> +++ b/test/replication/qsync_basic.test.lua
> @@ -118,6 +118,18 @@ test_run:switch('replica')
>   box.space.test:select{6}
>   box.space.sync:select{6}
>   
> +--
> +-- gh-5123: quorum 1 still should write CONFIRM.
> +--
> +test_run:switch('default')
> +box.cfg{replication_synchro_quorum = 1, replication_synchro_timeout = 5}
> +oldlsn = box.info.lsn
> +box.space.sync:replace{7}
> +newlsn = box.info.lsn
> +assert(newlsn >= oldlsn + 2)
> +test_run:switch('replica')
> +box.space.sync:select{7}
> +
>   -- Cleanup.
>   test_run:cmd('switch default')
>   
> diff --git a/test/replication/qsync_errinj.result b/test/replication/qsync_errinj.result
> new file mode 100644
> index 000000000..1d2945761
> --- /dev/null
> +++ b/test/replication/qsync_errinj.result
> @@ -0,0 +1,114 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +engine = test_run:get_cfg('engine')
> + | ---
> + | ...
> +
> +old_synchro_quorum = box.cfg.replication_synchro_quorum
> + | ---
> + | ...
> +old_synchro_timeout = box.cfg.replication_synchro_timeout
> + | ---
> + | ...
> +box.schema.user.grant('guest', 'super')
> + | ---
> + | ...
> +
> +test_run:cmd('create server replica with rpl_master=default,\
> +             script="replication/replica.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica with wait=True, wait_load=True')
> + | ---
> + | - true
> + | ...
> +
> +_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
> + | ---
> + | ...
> +_ = box.space.sync:create_index('pk')
> + | ---
> + | ...
> +
> +--
> +-- gh-5123: replica WAL fail shouldn't crash with quorum 1.
> +--
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_quorum = 1, replication_synchro_timeout = 5}
> + | ---
> + | ...
> +box.space.sync:insert{1}
> + | ---
> + | - [1]
> + | ...
> +
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.error.injection.set('ERRINJ_WAL_IO', true)
> + | ---
> + | - ok
> + | ...
> +
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.space.sync:insert{2}
> + | ---
> + | - [2]
> + | ...
> +
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +test_run:wait_upstream(1, {status='stopped'})
> + | ---
> + | - true
> + | ...
> +box.error.injection.set('ERRINJ_WAL_IO', false)
> + | ---
> + | - ok
> + | ...
> +
> +test_run:cmd('restart server replica')
> + |
> +box.space.sync:select{2}
> + | ---
> + | - - [2]
> + | ...
> +
> +test_run:cmd('switch default')
> + | ---
> + | - true
> + | ...
> +
> +box.cfg{                                                                        \
> +    replication_synchro_quorum = old_synchro_quorum,                            \
> +    replication_synchro_timeout = old_synchro_timeout,                          \
> +}
> + | ---
> + | ...
> +test_run:cmd('stop server replica')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica')
> + | ---
> + | - true
> + | ...
> +
> +box.space.sync:drop()
> + | ---
> + | ...
> +box.schema.user.revoke('guest', 'super')
> + | ---
> + | ...
> diff --git a/test/replication/qsync_errinj.test.lua b/test/replication/qsync_errinj.test.lua
> new file mode 100644
> index 000000000..96495ae6c
> --- /dev/null
> +++ b/test/replication/qsync_errinj.test.lua
> @@ -0,0 +1,45 @@
> +test_run = require('test_run').new()
> +engine = test_run:get_cfg('engine')
> +
> +old_synchro_quorum = box.cfg.replication_synchro_quorum
> +old_synchro_timeout = box.cfg.replication_synchro_timeout
> +box.schema.user.grant('guest', 'super')
> +
> +test_run:cmd('create server replica with rpl_master=default,\
> +             script="replication/replica.lua"')
> +test_run:cmd('start server replica with wait=True, wait_load=True')
> +
> +_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
> +_ = box.space.sync:create_index('pk')
> +
> +--
> +-- gh-5123: replica WAL fail shouldn't crash with quorum 1.
> +--
> +test_run:switch('default')
> +box.cfg{replication_synchro_quorum = 1, replication_synchro_timeout = 5}
> +box.space.sync:insert{1}
> +
> +test_run:switch('replica')
> +box.error.injection.set('ERRINJ_WAL_IO', true)
> +
> +test_run:switch('default')
> +box.space.sync:insert{2}
> +
> +test_run:switch('replica')
> +test_run:wait_upstream(1, {status='stopped'})
> +box.error.injection.set('ERRINJ_WAL_IO', false)
> +
> +test_run:cmd('restart server replica')
> +box.space.sync:select{2}
> +
> +test_run:cmd('switch default')
> +
> +box.cfg{                                                                        \
> +    replication_synchro_quorum = old_synchro_quorum,                            \
> +    replication_synchro_timeout = old_synchro_timeout,                          \
> +}
> +test_run:cmd('stop server replica')
> +test_run:cmd('delete server replica')
> +
> +box.space.sync:drop()
> +box.schema.user.revoke('guest', 'super')
> diff --git a/test/replication/suite.ini b/test/replication/suite.ini
> index 6119a264b..11f8d4e20 100644
> --- a/test/replication/suite.ini
> +++ b/test/replication/suite.ini
> @@ -3,7 +3,7 @@ core = tarantool
>   script =  master.lua
>   description = tarantool/box, replication
>   disabled = consistent.test.lua
> -release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua
> +release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua
>   config = suite.cfg
>   lua_libs = lua/fast_replica.lua lua/rlimit.lua
>   use_unix_sockets = True

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 04/19] replication: make sync transactions wait quorum
  2020-07-02  8:48     ` Serge Petrenko
@ 2020-07-03 21:16       ` Vladislav Shpilevoy
  0 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-03 21:16 UTC (permalink / raw)
  To: Serge Petrenko, tarantool-patches

Thanks for the review!

(The email was intended to be sent yesterday, but my email client
somehow screw it.)

>> diff --git a/src/box/txn.c b/src/box/txn.c
>> index edc1f5180..6cfa98212 100644
>> --- a/src/box/txn.c
>> +++ b/src/box/txn.c
>> @@ -658,7 +701,11 @@ txn_commit(struct txn *txn)
>>           diag_log();
>>           return -1;
>>       }
>> -
>> +    if (is_sync) {
>> +        txn_limbo_assign_lsn(&txn_limbo, limbo_entry,
>> +                     req->rows[req->n_rows - 1]->lsn);
> 
> This assumes that the last tx row is a global one. This'll be true once
> #4928 [1] is fixed. However, the fix is a crutch, either appending a dummy NOP
> statement at the end of tx, or reordering the rows so that the last one is
> global. If we remove the crutch someday, we'll also break LSN assignment here.
> Maybe there's a better way to assign LSN to a synchronous tx?
> 
> Another point. Once async transactions are also added to limbo, we'll
> have fully local transactions in limbo. Local transactions have a separate
> LSN counter, so we probably have to assign them the same LSN as the last
> synchronous transaction waiting in limbo. Looks like this'll work.

Good point.

Use of the last LSN won't work though, because the last sync transaction
in the limbo still may have not assigned LSN, i.e. -1. So we can't take it.
Because it does not exist yet. I thought about making txn_limbo_assign_lsn()
do that for all next async transactions, but looks like a crutch.

Instead, I make the async transaction appearance more explicit - their LSNs
are always -1 in the limbo and are never changed. See a new email thread
with the solution and some tests.

> [1] https://github.com/tarantool/tarantool/issues/4928

I added a test for sync transactions having a local row last. It is
disabled until 4928 is fixed. And will broke, when we will revert
the crutch in favor of whatever the other solution will be. So we
will notice.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 10/19] txn_limbo: add ROLLBACK processing
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 10/19] txn_limbo: add ROLLBACK processing Vladislav Shpilevoy
@ 2020-07-05 15:29     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-05 15:29 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

> diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
> index ac57fd1bd..680e81d3d 100644
> --- a/src/box/txn_limbo.c
> +++ b/src/box/txn_limbo.c
> @@ -127,33 +141,64 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
> -/**
> - * Write a confirmation entry to WAL. After it's written all the
> - * transactions waiting for confirmation may be finished.
> - */
>  static int
> -txn_limbo_write_confirm(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
> +txn_limbo_write_confirm_rollback(struct txn_limbo *limbo,
> +				 struct txn_limbo_entry *entry,
> +				 bool is_confirm)
>  {
>  	struct xrow_header row;
>  	struct request request = {
>  		.header = &row,
>  	};
>  
> -	if (xrow_encode_confirm(&row, limbo->instance_id, entry->lsn) < 0)
> +	int res = 0;
> +	if (is_confirm) {
> +		res = xrow_encode_confirm(&row, limbo->instance_id, entry->lsn);
> +	} else {
> +		/*
> +		 * This entry is the first to be rolled back, so
> +		 * the last "safe" lsn is entry->lsn - 1.
> +		 */
> +		res = xrow_encode_rollback(&row, limbo->instance_id,
> +					   entry->lsn - 1);

Why can't you write the exact lsn + change the check in txn_limbo_read_rollback()
from

    if (e->lsn <= lsn)
        break;

to

    if (e->lsn < lsn)
        break;

Currently the rollback entry contains LSN which counter-intuitively shouldn't be
rolled back.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 04/19] replication: make sync transactions wait quorum
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 04/19] replication: make sync transactions wait quorum Vladislav Shpilevoy
  2020-06-30 23:00     ` Vladislav Shpilevoy
  2020-07-02  8:48     ` Serge Petrenko
@ 2020-07-05 16:05     ` Vladislav Shpilevoy
  2 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-05 16:05 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Applied this diff:

====================
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 2b4aae477..13cef540f 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -124,10 +124,11 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 {
 	struct txn *txn = entry->txn;
 	assert(entry->lsn > 0);
-	assert(!txn_has_flag(txn, TXN_IS_DONE));
-	assert(txn_has_flag(txn, TXN_WAIT_SYNC));
 	if (txn_limbo_check_complete(limbo, entry))
 		goto complete;
+
+	assert(!txn_has_flag(txn, TXN_IS_DONE));
+	assert(txn_has_flag(txn, TXN_WAIT_SYNC));
 	bool cancellable = fiber_set_cancellable(false);
 	while (!txn_limbo_entry_is_complete(entry))
 		fiber_yield();

====================

Because if a transaction is already complete, it will have
IS_DONE and won't have WAIT_SYNC.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication
  2020-07-02 23:20     ` Vladislav Shpilevoy
@ 2020-07-06 12:30       ` Sergey Bronnikov
  0 siblings, 0 replies; 68+ messages in thread
From: Sergey Bronnikov @ 2020-07-06 12:30 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi, Vlad

Thanks for pointing to this.
Test for which you provide output uses error injection
ERRINJ_SYNC_TIMEOUT introduced in a separate patch [1]. The original
idea is to have an ability to manually control sync timeout.  Triggered
error ER_SYNC_QUORUM_TIMEOUT in a test means that patch in [1] is wrong
or incomplete. Perhaps we should add injection to another place (may be
src/box/txn_limbo.c), not in a src/box/replication.cc.
Could you review [1] and provide a feedback?

1. https://lists.tarantool.org/pipermail/tarantool-patches/2020-July/018269.html

Sergey

On 01:20 Fri 03 Jul , Vladislav Shpilevoy wrote:
> I didn't review it properly. Just a local fail:
> 
> [001] replication/qsync_advanced.test.lua             memtx           [ fail ]
> [001] 
> [001] Test failed! Result content mismatch:
> [001] --- replication/qsync_advanced.result	Fri Jul  3 01:14:24 2020
> [001] +++ replication/qsync_advanced.reject	Fri Jul  3 01:17:49 2020
> [001] @@ -538,7 +538,7 @@
> [001]  -- Disable synchronous mode.
> [001]  disable_sync_mode()
> [001]   | ---
> [001] - | - error: A rollback for a synchronous transaction is received
> [001] + | - error: Quorum collection for a synchronous transaction is timed out
> [001]   | ...
> [001]  -- Space is in async mode now.
> [001]  box.space.sync:insert{3} -- async operation must wait sync one

-- 
sergeyb@

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH] Add new error injection constant ERRINJ_SYNC_TIMEOUT
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (24 preceding siblings ...)
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 4/4] replication: add tests for sync replication with snapshots sergeyb
@ 2020-07-06 23:31   ` Vladislav Shpilevoy
  2020-07-10  0:50   ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
  2020-07-10  7:40   ` Kirill Yukhin
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-06 23:31 UTC (permalink / raw)
  To: sergeyb, tarantool-patches, sergepetrenko, gorcunov, lvasiliev

Привет!

On 06/07/2020 12:32, sergeyb@tarantool.org wrote:
> From: Sergey Bronnikov <sergeyb@tarantool.org>
> 
> Some tests for synchronous replication requires creation of a special state
> when replication_synchro_timeout is not exceeded. Greater timeout value will
> make tests longer and increase probability of flakiness. When constant
> ERRINJ_SYNC_TIMEOUT is set to true operation will stay in a state nor confirmed
> nor rolled back.

Честно говоря я не очень понял, как связаны ERRINJ_SYNC_TIMEOUT и синхра.
Место, куда ты добавил этот инджекшн, вообще никакого отношения к синхронной
репликации не имеет. Но это не главная проблема. См. дальше.

> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index ef0e2411d..675a76a7e 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -866,6 +867,9 @@ replicaset_sync(void)
>  	       replicaset.applier.loading >= quorum) {
>  		if (fiber_cond_wait_deadline(&replicaset.applier.cond,
>  					     deadline) != 0)
> +			ERROR_INJECT(ERRINJ_SYNC_TIMEOUT, {
> +				continue;
> +			});

Смысл errinj модуля в том, чтоб сымитировать реальные ошибки или
просто редкие пути выполнения. Конкретно этот путь невозможен
впринципе. Что делает этот errinj бесполезным. То есть он может
в теории что-то и помогает протестировать, но это что-то не будет
связано с реальностью.

Какую задачу ты хочешь тут решить? Выключить аки от репликации?
Посмотри в ERRINJ_APPLIER_SLOW_ACK. Или просто увеличь
replication_timeout. Или включи ERRINJ_WAL_DELAY на реплике - она
не будет ничего писать в WAL и не будет посылать аки.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication sergeyb
  2020-07-02 22:46     ` Sergey Bronnikov
  2020-07-02 23:20     ` Vladislav Shpilevoy
@ 2020-07-06 23:31     ` Vladislav Shpilevoy
  2020-07-07 12:12       ` Sergey Bronnikov
  2 siblings, 1 reply; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-06 23:31 UTC (permalink / raw)
  To: sergeyb, tarantool-patches, sergepetrenko, gorcunov, lvasiliev

Коммит уже сверху поменялся другими коммитами, так что далее я
копипасчу тест как он есть на самом верху ветки.

См. 16 комментариев далее по тексту.

> env = require('test_run')
> test_run = env.new()
> engine = test_run:get_cfg('engine')
> fiber = require('fiber')
> 
> orig_synchro_quorum = box.cfg.replication_synchro_quorum
> orig_synchro_timeout = box.cfg.replication_synchro_timeout
> 
> NUM_INSTANCES = 2
> BROKEN_QUORUM = NUM_INSTANCES + 1
> 
> test_run:cmd("setopt delimiter ';'")
> disable_sync_mode = function()
>     local s = box.space._space:get(box.space.sync.id)
>     local new_s = s:update({{'=', 6, {is_sync=false}}})
>     box.space._space:replace(new_s)

1. Я не уверен, что есть смысл сейчас тестировать смену типа спейса с
асинхронного на синхронный и наоборот. Потому как непонятно до конца,
как это должно работать. Нужен ли кворум на эту DDL транзакцию? Если
да, то как его запрашивать? _space не синхронный. И надо ли его
запрашивать вообще, или первая же транзакция на этот спейс уже "пропушает"
его синхронность? Надо на эти вопросы ответить, а потом их уже
тестить кмк.

> end;
> test_run:cmd("setopt delimiter ''");
> 
> box.schema.user.grant('guest', 'replication')
> 
> -- Setup an async cluster with two instances.

2. Вообще говоря, у нас нет понятия асинхронный или синхронный кластер. Это
спейсы могут быть синхронными или нет.

> test_run:cmd('create server replica with rpl_master=default,\
>                                          script="replication/replica.lua"')
> test_run:cmd('start server replica with wait=True, wait_load=True')
> 
> -- Successful write.
> -- Testcase setup.

3. Этот тест явно не выглядит как advanced. И уже вполне покрыт в basic тестах.
Кроме того, таймаута 0.1 может не хватить. Тест может чуть подвиснуть под нагрузкой
и все полетит в ужасно огромный дифф с кучей ошибок. Если используешь
таймауты, то надо стараться делать так: если хочешь, чтоб таймаут не истек,
ставь его огромным, в тысячи; если хочешь, чтоб истек - ставь в ноль или в очень-очень
малое число (0.0001, например) + можно добавить fiber.sleep(<этот таймаут>),
чтоб все дедлайны в коде наверняка просрались.

Все тоже самое про тест ниже (Unsuccessful write).

> test_run:switch('default')
> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> box.space.sync:insert{1} -- success
> test_run:cmd('switch replica')
> box.space.sync:select{} -- 1
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- Unsuccessfull write.

4. Unsuccessfull -> Unsuccessful.

> -- Testcase setup.
> test_run:switch('default')
> box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> box.space.sync:insert{1}
> test_run:switch('replica')
> box.space.sync:select{} -- none
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- Updated replication_synchro_quorum doesn't affect existed tx.

5. Интересно, что когда я сделал, чтоб он начал аффектить, то этот тест
тоже прошел. В чем его полезность тогда? Кроме того, он теперь
покрыт в qsync_basic.test.lua.

> -- Testcase setup.
> test_run:switch('default')
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.0001}
> box.error.injection.set('ERRINJ_SYNC_TIMEOUT', true)
> test_run:cmd("setopt delimiter ';'")
> _ = fiber.create(function()
>     box.space.sync:insert{1}
> end);
> test_run:cmd("setopt delimiter ''");
> box.cfg{replication_synchro_quorum=NUM_INSTANCES}
> box.error.injection.set('ERRINJ_SYNC_TIMEOUT', false)
> box.space.sync:select{} -- none
> test_run:switch('replica')
> box.space.sync:select{} -- none
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- [RFC, quorum commit] attempt to write multiple transactions, expected the
> -- same order as on client in case of achieved quorum.
> -- Testcase setup.

6. Опять же, все эти вещи покрыты в basic. Зачем это дублирование?

> test_run:switch('default')
> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> box.space.sync:insert{1}
> box.space.sync:insert{2}
> box.space.sync:insert{3}
> box.space.sync:select{} -- 1, 2, 3
> test_run:switch('replica')
> box.space.sync:select{} -- 1, 2, 3
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- Synchro timeout is not bigger than replication_synchro_timeout value.
> -- Testcase setup.
> test_run:switch('default')
> box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=orig_synchro_timeout}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> start = os.time()
> box.space.sync:insert{1}
> (os.time() - start) == box.cfg.replication_synchro_timeout -- true

7. Очень плохая идея. Если процесс подвиснет тут ненадолго, то эта проверка
упадет. Не должно быть тестов, которые полагаются на то, что процесс будет
выполняться стабильно.

> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- replication_synchro_quorum
> test_run:switch('default')
> INT_MIN = -2147483648
> INT_MAX = 2147483648
> box.cfg{replication_synchro_quorum=INT_MAX} -- error
> box.cfg.replication_synchro_quorum -- old value
> box.cfg{replication_synchro_quorum=INT_MIN} -- error
> box.cfg.replication_synchro_quorum -- old value

8. Это тоже явно не advanced тесты. Это самые базовые проверки.

> -- replication_synchro_timeout
> test_run:switch('default')
> DOUBLE_MAX = 9007199254740992
> box.cfg{replication_synchro_timeout=DOUBLE_MAX}
> box.cfg.replication_synchro_timeout -- DOUBLE_MAX
> box.cfg{replication_synchro_timeout=DOUBLE_MAX+1}
> box.cfg.replication_synchro_timeout -- DOUBLE_MAX
> box.cfg{replication_synchro_timeout=-1} -- error
> box.cfg.replication_synchro_timeout -- old value
> box.cfg{replication_synchro_timeout=0} -- error
> box.cfg.replication_synchro_timeout -- old value
> 
> -- TX is in synchronous replication.

9. Не понял комментарий. И не понял, чем этот тест отличается от
"[RFC, quorum commit]" выше.

> -- Testcase setup.
> test_run:switch('default')
> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> box.begin() box.space.sync:insert({1}) box.commit()
> box.begin() box.space.sync:insert({2}) box.commit()
> -- Testcase cleanup.
> box.space.sync:drop()
> 
> -- [RFC, summary] switch sync replicas into async ones, expected success and
> -- data consistency on a leader and replicas.

10. Это пожалуй пока единственный тест, который тут можно было бы оставить.
То есть 'advanced'. Но коммент неверен - нет никаких синхронных реплик.
Есть синхронные транзакции. Которые определяются синхронными спейсами.

> -- Testcase setup.
> test_run:switch('default')
> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> box.space.sync:insert{1}
> box.space.sync:select{} -- 1
> test_run:switch('replica')
> box.space.sync:select{} -- 1
> test_run:switch('default')
> -- Disable synchronous mode.
> disable_sync_mode()
> -- Space is in async mode now.
> box.cfg{replication_synchro_quorum=NUM_INSTANCES}
> box.space.sync:insert{2} -- success
> box.space.sync:insert{3} -- success
> box.cfg{replication_synchro_quorum=BROKEN_QUORUM}
> box.space.sync:insert{4} -- success
> box.cfg{replication_synchro_quorum=NUM_INSTANCES}
> box.space.sync:insert{5} -- success
> box.space.sync:select{} -- 1, 2, 3, 4, 5
> test_run:cmd('switch replica')
> box.space.sync:select{} -- 1, 2, 3, 4, 5
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- [RFC, Synchronous replication enabling] "As soon as last operation of
> -- synchronous transaction appeared in leader's WAL, it will cause all
> -- following transactions - no matter if they are synchronous or not - wait for
> -- the quorum. In case quorum is not achieved the 'rollback' operation will
> -- cause rollback of all transactions after the synchronous one."

11. Это уже протестировано в qsync_errinj.test.lua.

> -- Testcase setup.
> test_run:switch('default')
> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> box.space.sync:insert{1}
> box.space.sync:select{} -- 1
> test_run:switch('replica')
> box.space.sync:select{} -- 1
> test_run:switch('default')
> box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
> box.error.injection.set('ERRINJ_SYNC_TIMEOUT', true)

12. Если тест использует errinj, он не должен запускаться в release сборке.
См release_disabled в suite.ini файлах.

> test_run:cmd("setopt delimiter ';'")
> _ = fiber.create(function()
>     box.space.sync:insert{2}
> end);
> test_run:cmd("setopt delimiter ''");
> -- Disable synchronous mode.
> disable_sync_mode()
> -- Space is in async mode now.
> box.space.sync:insert{3} -- async operation must wait sync one
> box.error.injection.set('ERRINJ_SYNC_TIMEOUT', false)
> box.space.sync:select{} -- 1
> test_run:cmd('switch replica')
> box.space.sync:select{} -- 1
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- Warn user when setting `replication_synchro_quorum` to a value
> -- greater than number of instances in a cluster, see gh-5122.
> box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning

13. Этот тест походу вообще ничего не проверяет. Варнинг сейчас не пишется,
и тест проходит.

> -- [RFC, summary] switch from leader to replica and vice versa, expected
> -- success and data consistency on a leader and replicas (gh-5124).

14. Вот это норм тест для advanced. Серега П. добавил API чтоб очищать лимб
при переключении. Надо его тоже протестировать, сейчас на это ни одного
теста нет.

> -- Testcase setup.
> test_run:switch('default')
> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> box.space.sync:insert{1}
> box.space.sync:select{} -- 1
> test_run:switch('replica')
> box.space.sync:select{} -- 1
> box.cfg{read_only=false} -- promote replica to master
> test_run:switch('default')
> box.cfg{read_only=true} -- demote master to replica
> test_run:switch('replica')
> box.space.sync:insert{2}
> box.space.sync:select{} -- 1, 2
> test_run:switch('default')
> box.space.sync:select{} -- 1, 2
> -- Revert cluster configuration.
> test_run:switch('default')
> box.cfg{read_only=false}
> test_run:switch('replica')
> box.cfg{read_only=true}
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- Check behaviour with failed write to WAL on master (ERRINJ_WAL_IO).
> -- Testcase setup.
> test_run:switch('default')
> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> box.space.sync:insert{1}
> box.space.sync:select{} -- 1
> box.error.injection.set('ERRINJ_WAL_IO', true)
> box.space.sync:insert{2}
> box.error.injection.set('ERRINJ_WAL_IO', false)
> box.space.sync:select{} -- 1
> test_run:switch('replica')
> box.space.sync:select{} -- 1
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- [RFC, quorum commit] check behaviour with failure answer from a replica
> -- (ERRINJ_WAL_SYNC) during write, expected disconnect from the replication

15. Здесь не используется ERRINJ_WAL_SYNC.

> -- (gh-5123, set replication_synchro_quorum to 1).
> -- Testcase setup.
> test_run:switch('default')
> box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> box.space.sync:insert{1}
> box.space.sync:select{} -- 1
> test_run:switch('replica')
> box.error.injection.set('ERRINJ_WAL_IO', true)
> test_run:switch('default')
> box.space.sync:insert{2}
> test_run:switch('replica')
> box.error.injection.set('ERRINJ_WAL_IO', false)
> box.space.sync:select{} -- 1
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- Teardown.
> test_run:cmd('switch default')
> test_run:cmd('stop server replica')
> test_run:cmd('delete server replica')
> test_run:cleanup_cluster()
> box.schema.user.revoke('guest', 'replication')
> box.cfg{                                                                        \
>     replication_synchro_quorum = orig_synchro_quorum,                           \
>     replication_synchro_timeout = orig_synchro_timeout,                         \
> }
> 
> -- Setup an async cluster.
> box.schema.user.grant('guest', 'replication')
> test_run:cmd('create server replica with rpl_master=default,\
>                                          script="replication/replica.lua"')
> test_run:cmd('start server replica with wait=True, wait_load=True')

16. Зачем все пересоздавать?

> -- [RFC, summary] switch async replica into sync one, expected
> -- success and data consistency on a leader and replica.
> -- Testcase setup.
> _ = box.schema.space.create('sync', {engine=engine})
> _ = box.space.sync:create_index('pk')
> box.space.sync:insert{1} -- success
> test_run:cmd('switch replica')
> box.space.sync:select{} -- 1
> test_run:switch('default')
> -- Enable synchronous mode.
> disable_sync_mode()
> -- Space is in sync mode now.
> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> box.space.sync:insert{2} -- success
> box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
> box.space.sync:insert{3} -- success
> box.space.sync:select{} -- 1, 2, 3
> test_run:cmd('switch replica')
> box.space.sync:select{} -- 1, 2, 3
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- Teardown.
> test_run:cmd('switch default')
> test_run:cmd('stop server replica')
> test_run:cmd('delete server replica')
> test_run:cleanup_cluster()
> box.schema.user.revoke('guest', 'replication')
> box.cfg{                                                                        \
>     replication_synchro_quorum = orig_synchro_quorum,                           \
>     replication_synchro_timeout = orig_synchro_timeout,                         \
> }

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 3/4] replication: add tests for sync replication with anon replica
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 3/4] replication: add tests for sync replication with anon replica sergeyb
@ 2020-07-06 23:31     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-06 23:31 UTC (permalink / raw)
  To: sergeyb, tarantool-patches, sergepetrenko, gorcunov, lvasiliev

> env = require('test_run')
> test_run = env.new()
> engine = test_run:get_cfg('engine')
> 
> orig_synchro_quorum = box.cfg.replication_synchro_quorum
> orig_synchro_timeout = box.cfg.replication_synchro_timeout
> 
> NUM_INSTANCES = 2
> BROKEN_QUORUM = NUM_INSTANCES + 1
> 
> box.schema.user.grant('guest', 'replication')
> 
> -- Setup a cluster with anonymous replica.
> test_run:cmd('create server replica_anon with rpl_master=default, script="replication/anon1.lua"')
> test_run:cmd('start server replica_anon')
> test_run:cmd('switch replica_anon')
> 
> -- [RFC, Asynchronous replication] successful transaction applied on async
> -- replica.

Я чет не понял. Как это проходит? Вроде порешали, что анонимные реплики
в кворуме не участвуют. А здесь очевидно участвуют.

> -- Testcase setup.
> test_run:switch('default')
> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> test_run:switch('default')
> box.space.sync:insert{1} -- success
> box.space.sync:insert{2} -- success
> box.space.sync:insert{3} -- success
> test_run:cmd('switch replica_anon')
> box.space.sync:select{} -- 1, 2, 3
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- [RFC, Asynchronous replication] failed transaction rolled back on async
> -- replica.
> -- Testcase setup.
> box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> _ = box.space.sync:create_index('pk')
> -- Testcase body.
> test_run:switch('default')
> box.space.sync:insert{1} -- failure
> test_run:cmd('switch replica_anon')
> box.space.sync:select{} -- none
> test_run:switch('default')
> box.cfg{replication_synchro_quorum=NUM_INSTANCES}
> box.space.sync:insert{1} -- success
> test_run:cmd('switch replica_anon')
> box.space.sync:select{} -- 1
> -- Testcase cleanup.
> test_run:switch('default')
> box.space.sync:drop()
> 
> -- Teardown.
> test_run:switch('default')
> test_run:cmd('stop server replica_anon')
> test_run:cmd('delete server replica_anon')
> box.schema.user.revoke('guest', 'replication')
> box.cfg{                                                                        \
>     replication_synchro_quorum = orig_synchro_quorum,                           \
>     replication_synchro_timeout = orig_synchro_timeout,                         \
> }
> test_run:cleanup_cluster()

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 4/4] replication: add tests for sync replication with snapshots
  2020-07-02 21:13   ` [Tarantool-patches] [PATCH 4/4] replication: add tests for sync replication with snapshots sergeyb
  2020-07-02 22:46     ` Sergey Bronnikov
  2020-07-02 23:20     ` Vladislav Shpilevoy
@ 2020-07-06 23:31     ` Vladislav Shpilevoy
  2020-07-07 16:00       ` Sergey Bronnikov
  2 siblings, 1 reply; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-06 23:31 UTC (permalink / raw)
  To: sergeyb, tarantool-patches, sergepetrenko, gorcunov, lvasiliev

Тест продолжает регулярно падать, на последней версии ветки.

[001] replication/qsync_snapshots.test.lua            memtx           [ fail ]
[001] 
[001] Test failed! Result content mismatch:
[001] --- replication/qsync_snapshots.result	Tue Jul  7 00:34:34 2020
[001] +++ replication/qsync_snapshots.reject	Tue Jul  7 01:28:53 2020
[001] @@ -233,7 +233,7 @@
[001]   | ...
[001]  box.snapshot() -- abort
[001]   | ---
[001] - | - error: A rollback for a synchronous transaction is received
[001] + | - error: Quorum collection for a synchronous transaction is timed out
[001]   | ...
[001]  box.space.sync:select{} -- 1
[001]   | ---
[001] 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication
  2020-07-06 23:31     ` Vladislav Shpilevoy
@ 2020-07-07 12:12       ` Sergey Bronnikov
  2020-07-07 20:57         ` Vladislav Shpilevoy
  0 siblings, 1 reply; 68+ messages in thread
From: Sergey Bronnikov @ 2020-07-07 12:12 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On 01:31 Tue 07 Jul , Vladislav Shpilevoy wrote:
> Коммит уже сверху поменялся другими коммитами, так что далее я
> копипасчу тест как он есть на самом верху ветки.
> 
> См. 16 комментариев далее по тексту.
> 
> > env = require('test_run')
> > test_run = env.new()
> > engine = test_run:get_cfg('engine')
> > fiber = require('fiber')
> > 
> > orig_synchro_quorum = box.cfg.replication_synchro_quorum
> > orig_synchro_timeout = box.cfg.replication_synchro_timeout
> > 
> > NUM_INSTANCES = 2
> > BROKEN_QUORUM = NUM_INSTANCES + 1
> > 
> > test_run:cmd("setopt delimiter ';'")
> > disable_sync_mode = function()
> >     local s = box.space._space:get(box.space.sync.id)
> >     local new_s = s:update({{'=', 6, {is_sync=false}}})
> >     box.space._space:replace(new_s)
> 
> 1. Я не уверен, что есть смысл сейчас тестировать смену типа спейса с
> асинхронного на синхронный и наоборот. Потому как непонятно до конца,
> как это должно работать. Нужен ли кворум на эту DDL транзакцию? Если
> да, то как его запрашивать? _space не синхронный. И надо ли его
> запрашивать вообще, или первая же транзакция на этот спейс уже "пропушает"
> его синхронность? Надо на эти вопросы ответить, а потом их уже
> тестить кмк.

У нас есть RFC, который мы долго обсуждали и если есть вопросы по
дизайну, то нужно дорабатывать RFC, а не выпиливать тесты. Пока оставляю
этот вопрос открытым.

> > end;
> > test_run:cmd("setopt delimiter ''");
> > 
> > box.schema.user.grant('guest', 'replication')
> > 
> > -- Setup an async cluster with two instances.
> 
> 2. Вообще говоря, у нас нет понятия асинхронный или синхронный кластер. Это
> спейсы могут быть синхронными или нет.

Да, все так, убрал указание типа кластера.

--- Setup an async cluster with two instances.
+-- Setup an cluster with two instances.

> > test_run:cmd('create server replica with rpl_master=default,\
> >                                          script="replication/replica.lua"')
> > test_run:cmd('start server replica with wait=True, wait_load=True')
> > 
> > -- Successful write.
> > -- Testcase setup.
> 
> 3. Этот тест явно не выглядит как advanced. И уже вполне покрыт в basic тестах.
> Кроме того, таймаута 0.1 может не хватить. Тест может чуть подвиснуть под нагрузкой
> и все полетит в ужасно огромный дифф с кучей ошибок. Если используешь
> таймауты, то надо стараться делать так: если хочешь, чтоб таймаут не истек,
> ставь его огромным, в тысячи; если хочешь, чтоб истек - ставь в ноль или в очень-очень
> малое число (0.0001, например) + можно добавить fiber.sleep(<этот таймаут>),
> чтоб все дедлайны в коде наверняка просрались.

Когда я начинал писать тесты мне нужно было вначале сделать smoke тесты,
чтобы перед запуском более сложных быть уверенным, что базовый сценарий
работает.

> Все тоже самое про тест ниже (Unsuccessful write).

Убрал оба.

> > test_run:switch('default')
> > box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> > _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > _ = box.space.sync:create_index('pk')
> > -- Testcase body.
> > box.space.sync:insert{1} -- success
> > test_run:cmd('switch replica')
> > box.space.sync:select{} -- 1
> > -- Testcase cleanup.
> > test_run:switch('default')
> > box.space.sync:drop()
> > 
> > -- Unsuccessfull write.
> 
> 4. Unsuccessfull -> Unsuccessful.
> 

Да, но неактуально, потому что удалил эти тесты.

> > -- Testcase setup.
> > test_run:switch('default')
> > box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
> > _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > _ = box.space.sync:create_index('pk')
> > -- Testcase body.
> > box.space.sync:insert{1}
> > test_run:switch('replica')
> > box.space.sync:select{} -- none
> > -- Testcase cleanup.
> > test_run:switch('default')
> > box.space.sync:drop()
> > 
> > -- Updated replication_synchro_quorum doesn't affect existed tx.
> 
> 5. Интересно, что когда я сделал, чтоб он начал аффектить, то этот тест
> тоже прошел. В чем его полезность тогда? Кроме того, он теперь
> покрыт в qsync_basic.test.lua.

Полезность тестов не только в том, чтобы сейчас выявлять какие то баги, но и в том,
чтобы зафиксировать желаемое поведение и при рефакторингах или редизайне фичи
срабатывать при изменении поведения. Но тест я удалил, потому что я не знаю,
как сделать надежное создание состояние для этого теста.

> > -- Testcase setup.
> > test_run:switch('default')
> > _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > _ = box.space.sync:create_index('pk')
> > -- Testcase body.
> > box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.0001}
> > box.error.injection.set('ERRINJ_SYNC_TIMEOUT', true)
> > test_run:cmd("setopt delimiter ';'")
> > _ = fiber.create(function()
> >     box.space.sync:insert{1}
> > end);
> > test_run:cmd("setopt delimiter ''");
> > box.cfg{replication_synchro_quorum=NUM_INSTANCES}
> > box.error.injection.set('ERRINJ_SYNC_TIMEOUT', false)
> > box.space.sync:select{} -- none
> > test_run:switch('replica')
> > box.space.sync:select{} -- none
> > -- Testcase cleanup.
> > test_run:switch('default')
> > box.space.sync:drop()
> > 
> > -- [RFC, quorum commit] attempt to write multiple transactions, expected the
> > -- same order as on client in case of achieved quorum.
> > -- Testcase setup.
> 
> 6. Опять же, все эти вещи покрыты в basic. Зачем это дублирование?

Хороший риторический вопрос. Перед тем как делать синхронную репликацию
мы проработали RFC и я описал тесты по этому RFC, которые я планировал
делать. Тесты отправлял на ревью и до того, как я начал их писать можно
было ознакомиться с планом. По мере того, как я писал тесты, я не всегда
успевал следить за тем, какие тесты добавляли в ветку. А те, кто пушил
тесты в ветку похоже не видел мой план по тестам.
Вот так получилось дублирование.

> > test_run:switch('default')
> > box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> > _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > _ = box.space.sync:create_index('pk')
> > -- Testcase body.
> > box.space.sync:insert{1}
> > box.space.sync:insert{2}
> > box.space.sync:insert{3}
> > box.space.sync:select{} -- 1, 2, 3
> > test_run:switch('replica')
> > box.space.sync:select{} -- 1, 2, 3
> > -- Testcase cleanup.
> > test_run:switch('default')
> > box.space.sync:drop()
> > 
> > -- Synchro timeout is not bigger than replication_synchro_timeout value.
> > -- Testcase setup.
> > test_run:switch('default')
> > box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=orig_synchro_timeout}
> > _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > _ = box.space.sync:create_index('pk')
> > -- Testcase body.
> > start = os.time()
> > box.space.sync:insert{1}
> > (os.time() - start) == box.cfg.replication_synchro_timeout -- true
>
> 7. Очень плохая идея. Если процесс подвиснет тут ненадолго, то эта проверка
> упадет. Не должно быть тестов, которые полагаются на то, что процесс будет
> выполняться стабильно.

Ты предлагаешь не проверять или есть более надежные способы проверки,
что таймаут именно такой величины, каким его выставили?

> > -- Testcase cleanup.
> > test_run:switch('default')
> > box.space.sync:drop()
> > 
> > -- replication_synchro_quorum
> > test_run:switch('default')
> > INT_MIN = -2147483648
> > INT_MAX = 2147483648
> > box.cfg{replication_synchro_quorum=INT_MAX} -- error
> > box.cfg.replication_synchro_quorum -- old value
> > box.cfg{replication_synchro_quorum=INT_MIN} -- error
> > box.cfg.replication_synchro_quorum -- old value
> 
> 8. Это тоже явно не advanced тесты. Это самые базовые проверки.

Я изначально делал тесты в отдельном файле, чтобы проще было изменять
это в общей ветке, без мержей, ребейзов и прочих вещей. Тесты назвались
advanced, потомы что должны были покрывать высокоуровневые требования из
RFC. Я могу перенести эти тесты в qsync_basic, если возражений по сути
тестов нет.

> > -- replication_synchro_timeout
> > test_run:switch('default')
> > DOUBLE_MAX = 9007199254740992
> > box.cfg{replication_synchro_timeout=DOUBLE_MAX}
> > box.cfg.replication_synchro_timeout -- DOUBLE_MAX
> > box.cfg{replication_synchro_timeout=DOUBLE_MAX+1}
> > box.cfg.replication_synchro_timeout -- DOUBLE_MAX
> > box.cfg{replication_synchro_timeout=-1} -- error
> > box.cfg.replication_synchro_timeout -- old value
> > box.cfg{replication_synchro_timeout=0} -- error
> > box.cfg.replication_synchro_timeout -- old value
> > 
> > -- TX is in synchronous replication.
> 
> 9. Не понял комментарий. И не понял, чем этот тест отличается от
> "[RFC, quorum commit]" выше.

По-моему я этот тест первым сделал и оставил для простой проверки того,
что tx работают. Сейчас да, дублирование есть. Удалю его.

> > -- Testcase setup.
> > test_run:switch('default')
> > box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> > _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > _ = box.space.sync:create_index('pk')
> > -- Testcase body.
> > box.begin() box.space.sync:insert({1}) box.commit()
> > box.begin() box.space.sync:insert({2}) box.commit()
> > -- Testcase cleanup.
> > box.space.sync:drop()
> > 
> > -- [RFC, summary] switch sync replicas into async ones, expected success and
> > -- data consistency on a leader and replicas.
> 
> 10. Это пожалуй пока единственный тест, который тут можно было бы оставить.
> То есть 'advanced'. Но коммент неверен - нет никаких синхронных реплик.
> Есть синхронные транзакции. Которые определяются синхронными спейсами.

RFC: "ability to switch async replicas into sync ones and vice versa"
                     ^^^^^^^^^^^^^^^^^^^
В тесте поправлю комментарий. Еще, как я понял, у тебя были возражения
по поводу того, как делаем выключение синхронной репликации, чтобы она
стала асинхронной. Или запись в системный спейс это ок?

> > -- Testcase setup.
> > test_run:switch('default')
> > box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> > _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > _ = box.space.sync:create_index('pk')
> > -- Testcase body.
> > box.space.sync:insert{1}
> > box.space.sync:select{} -- 1
> > test_run:switch('replica')
> > box.space.sync:select{} -- 1
> > test_run:switch('default')
> > -- Disable synchronous mode.
> > disable_sync_mode()
> > -- Space is in async mode now.
> > box.cfg{replication_synchro_quorum=NUM_INSTANCES}
> > box.space.sync:insert{2} -- success
> > box.space.sync:insert{3} -- success
> > box.cfg{replication_synchro_quorum=BROKEN_QUORUM}
> > box.space.sync:insert{4} -- success
> > box.cfg{replication_synchro_quorum=NUM_INSTANCES}
> > box.space.sync:insert{5} -- success
> > box.space.sync:select{} -- 1, 2, 3, 4, 5
> > test_run:cmd('switch replica')
> > box.space.sync:select{} -- 1, 2, 3, 4, 5
> > -- Testcase cleanup.
> > test_run:switch('default')
> > box.space.sync:drop()
> >
> > -- [RFC, Synchronous replication enabling] "As soon as last operation of
> > -- synchronous transaction appeared in leader's WAL, it will cause all
> > -- following transactions - no matter if they are synchronous or not - wait for
> > -- the quorum. In case quorum is not achieved the 'rollback' operation will
> > -- cause rollback of all transactions after the synchronous one."
> 
> 11. Это уже протестировано в qsync_errinj.test.lua.

Окей, удалил. Тем более, что с ERRINJ_SYNC_TIMEOUT у меня не получилось
сделать как я хотел.

> > -- Testcase setup.
> > test_run:switch('default')
> > box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> > _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > _ = box.space.sync:create_index('pk')
> > -- Testcase body.
> > box.space.sync:insert{1}
> > box.space.sync:select{} -- 1
> > test_run:switch('replica')
> > box.space.sync:select{} -- 1
> > test_run:switch('default')
> > box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
> > box.error.injection.set('ERRINJ_SYNC_TIMEOUT', true)
> 
> 12. Если тест использует errinj, он не должен запускаться в release сборке.
> См release_disabled в suite.ini файлах.

Да, я уже понял это и тест был поделен на две части, но в общую ветку
это не добавлял. Все изменения по этому ревью будут уже в разделенных
тестах.

> > test_run:cmd("setopt delimiter ';'")
> > _ = fiber.create(function()
> >     box.space.sync:insert{2}
> > end);
> > test_run:cmd("setopt delimiter ''");
> > -- Disable synchronous mode.
> > disable_sync_mode()
> > -- Space is in async mode now.
> > box.space.sync:insert{3} -- async operation must wait sync one
> > box.error.injection.set('ERRINJ_SYNC_TIMEOUT', false)
> > box.space.sync:select{} -- 1
> > test_run:cmd('switch replica')
> > box.space.sync:select{} -- 1
> > -- Testcase cleanup.
> > test_run:switch('default')
> > box.space.sync:drop()
> > 
> > -- Warn user when setting `replication_synchro_quorum` to a value
> > -- greater than number of instances in a cluster, see gh-5122.
> > box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning
> 
> 13. Этот тест походу вообще ничего не проверяет. Варнинг сейчас не пишется,
> и тест проходит.

Обычный процесс такой: если тест падает, то, пока есть открытая
проблема, добавляют XFAIL и при изменении поведения XFAIL меняется на
XPASS, чтобы убрать XFAIL.  У нас нет такого механизма, поэтому добавил
тест на будущее и когда варнинг добавят, то тест сломается и result файл
обновят. Мне кажется, что это вполне себе ок.

> > -- [RFC, summary] switch from leader to replica and vice versa, expected
> > -- success and data consistency on a leader and replicas (gh-5124).
> 
> 14. Вот это норм тест для advanced. Серега П. добавил API чтоб очищать лимб
> при переключении. Надо его тоже протестировать, сейчас на это ни одного
> теста нет.

Обсудили, нужно сделать тест для патча
https://github.com/tarantool/tarantool/commit/9b1ddeb69a4944c6e2701e5d6fbdddb586f94eb2#diff-25ed8a97a328d59f5830ec4f4e6e3ea3R94
Я сделаю.

> > -- Testcase setup.
> > test_run:switch('default')
> > box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> > _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > _ = box.space.sync:create_index('pk')
> > -- Testcase body.
> > box.space.sync:insert{1}
> > box.space.sync:select{} -- 1
> > test_run:switch('replica')
> > box.space.sync:select{} -- 1
> > box.cfg{read_only=false} -- promote replica to master
> > test_run:switch('default')
> > box.cfg{read_only=true} -- demote master to replica
> > test_run:switch('replica')
> > box.space.sync:insert{2}
> > box.space.sync:select{} -- 1, 2
> > test_run:switch('default')
> > box.space.sync:select{} -- 1, 2
> > -- Revert cluster configuration.
> > test_run:switch('default')
> > box.cfg{read_only=false}
> > test_run:switch('replica')
> > box.cfg{read_only=true}
> > -- Testcase cleanup.
> > test_run:switch('default')
> > box.space.sync:drop()
> > 
> > -- Check behaviour with failed write to WAL on master (ERRINJ_WAL_IO).
> > -- Testcase setup.
> > test_run:switch('default')
> > box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> > _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > _ = box.space.sync:create_index('pk')
> > -- Testcase body.
> > box.space.sync:insert{1}
> > box.space.sync:select{} -- 1
> > box.error.injection.set('ERRINJ_WAL_IO', true)
> > box.space.sync:insert{2}
> > box.error.injection.set('ERRINJ_WAL_IO', false)
> > box.space.sync:select{} -- 1
> > test_run:switch('replica')
> > box.space.sync:select{} -- 1
> > -- Testcase cleanup.
> > test_run:switch('default')
> > box.space.sync:drop()
> > 
> > -- [RFC, quorum commit] check behaviour with failure answer from a replica
> > -- (ERRINJ_WAL_SYNC) during write, expected disconnect from the replication
> 
> 15. Здесь не используется ERRINJ_WAL_SYNC.

Неконсистентность комментария и теста, убрал название ERRINJ из комментария.

> > -- (gh-5123, set replication_synchro_quorum to 1).
> > -- Testcase setup.
> > test_run:switch('default')
> > box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.1}
> > _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > _ = box.space.sync:create_index('pk')
> > -- Testcase body.
> > box.space.sync:insert{1}
> > box.space.sync:select{} -- 1
> > test_run:switch('replica')
> > box.error.injection.set('ERRINJ_WAL_IO', true)
> > test_run:switch('default')
> > box.space.sync:insert{2}
> > test_run:switch('replica')
> > box.error.injection.set('ERRINJ_WAL_IO', false)
> > box.space.sync:select{} -- 1
> > -- Testcase cleanup.
> > test_run:switch('default')
> > box.space.sync:drop()
> > 
> > -- Teardown.
> > test_run:cmd('switch default')
> > test_run:cmd('stop server replica')
> > test_run:cmd('delete server replica')
> > test_run:cleanup_cluster()
> > box.schema.user.revoke('guest', 'replication')
> > box.cfg{                                                                        \
> >     replication_synchro_quorum = orig_synchro_quorum,                           \
> >     replication_synchro_timeout = orig_synchro_timeout,                         \
> > }
> > 
> > -- Setup an async cluster.
> > box.schema.user.grant('guest', 'replication')
> > test_run:cmd('create server replica with rpl_master=default,\
> >                                          script="replication/replica.lua"')
> > test_run:cmd('start server replica with wait=True, wait_load=True')
> 
> 16. Зачем все пересоздавать?

по-моему когда конверсию async -> sync хотел тестировать, то решил
тестовое окружение заново сделать. сейчас в этом нет необходимости.

> > -- [RFC, summary] switch async replica into sync one, expected
> > -- success and data consistency on a leader and replica.
> > -- Testcase setup.
> > _ = box.schema.space.create('sync', {engine=engine})
> > _ = box.space.sync:create_index('pk')
> > box.space.sync:insert{1} -- success
> > test_run:cmd('switch replica')
> > box.space.sync:select{} -- 1
> > test_run:switch('default')
> > -- Enable synchronous mode.
> > disable_sync_mode()
> > -- Space is in sync mode now.
> > box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> > box.space.sync:insert{2} -- success
> > box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=0.1}
> > box.space.sync:insert{3} -- success
> > box.space.sync:select{} -- 1, 2, 3
> > test_run:cmd('switch replica')
> > box.space.sync:select{} -- 1, 2, 3
> > -- Testcase cleanup.
> > test_run:switch('default')
> > box.space.sync:drop()
> > 
> > -- Teardown.
> > test_run:cmd('switch default')
> > test_run:cmd('stop server replica')
> > test_run:cmd('delete server replica')
> > test_run:cleanup_cluster()
> > box.schema.user.revoke('guest', 'replication')
> > box.cfg{                                                                        \
> >     replication_synchro_quorum = orig_synchro_quorum,                           \
> >     replication_synchro_timeout = orig_synchro_timeout,                         \
> > }

-- 
sergeyb@

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 4/4] replication: add tests for sync replication with snapshots
  2020-07-06 23:31     ` Vladislav Shpilevoy
@ 2020-07-07 16:00       ` Sergey Bronnikov
  0 siblings, 0 replies; 68+ messages in thread
From: Sergey Bronnikov @ 2020-07-07 16:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On 01:31 Tue 07 Jul , Vladislav Shpilevoy wrote:
> Тест продолжает регулярно падать, на последней версии ветки.

завел тикет https://github.com/tarantool/tarantool/issues/5150
при снапшоте на реплике почему-то таймаутится кворум

> [001] replication/qsync_snapshots.test.lua            memtx           [ fail ]
> [001] 
> [001] Test failed! Result content mismatch:
> [001] --- replication/qsync_snapshots.result	Tue Jul  7 00:34:34 2020
> [001] +++ replication/qsync_snapshots.reject	Tue Jul  7 01:28:53 2020
> [001] @@ -233,7 +233,7 @@
> [001]   | ...
> [001]  box.snapshot() -- abort
> [001]   | ---
> [001] - | - error: A rollback for a synchronous transaction is received
> [001] + | - error: Quorum collection for a synchronous transaction is timed out
> [001]   | ...
> [001]  box.space.sync:select{} -- 1
> [001]   | ---
> [001] 

-- 
sergeyb@

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication
  2020-07-07 12:12       ` Sergey Bronnikov
@ 2020-07-07 20:57         ` Vladislav Shpilevoy
  2020-07-08 12:07           ` Sergey Bronnikov
  0 siblings, 1 reply; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-07 20:57 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

>>> test_run:switch('default')
>>> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
>>> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>> _ = box.space.sync:create_index('pk')
>>> -- Testcase body.
>>> box.space.sync:insert{1}
>>> box.space.sync:insert{2}
>>> box.space.sync:insert{3}
>>> box.space.sync:select{} -- 1, 2, 3
>>> test_run:switch('replica')
>>> box.space.sync:select{} -- 1, 2, 3
>>> -- Testcase cleanup.
>>> test_run:switch('default')
>>> box.space.sync:drop()
>>>
>>> -- Synchro timeout is not bigger than replication_synchro_timeout value.
>>> -- Testcase setup.
>>> test_run:switch('default')
>>> box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=orig_synchro_timeout}
>>> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>> _ = box.space.sync:create_index('pk')
>>> -- Testcase body.
>>> start = os.time()
>>> box.space.sync:insert{1}
>>> (os.time() - start) == box.cfg.replication_synchro_timeout -- true
>>
>> 7. Очень плохая идея. Если процесс подвиснет тут ненадолго, то эта проверка
>> упадет. Не должно быть тестов, которые полагаются на то, что процесс будет
>> выполняться стабильно.
> 
> Ты предлагаешь не проверять или есть более надежные способы проверки,
> что таймаут именно такой величины, каким его выставили?

Если тебе надо проверить, что таймаут провалился, то надо проверять,
что прошедшее время >= timeout, но точно не == timeout. Второе очень
ненадежно.

>>> -- Testcase cleanup.
>>> test_run:switch('default')
>>> box.space.sync:drop()
>>>
>>> -- replication_synchro_quorum
>>> test_run:switch('default')
>>> INT_MIN = -2147483648
>>> INT_MAX = 2147483648
>>> box.cfg{replication_synchro_quorum=INT_MAX} -- error
>>> box.cfg.replication_synchro_quorum -- old value
>>> box.cfg{replication_synchro_quorum=INT_MIN} -- error
>>> box.cfg.replication_synchro_quorum -- old value
>>
>> 8. Это тоже явно не advanced тесты. Это самые базовые проверки.
> 
> Я изначально делал тесты в отдельном файле, чтобы проще было изменять
> это в общей ветке, без мержей, ребейзов и прочих вещей. Тесты назвались
> advanced, потомы что должны были покрывать высокоуровневые требования из
> RFC. Я могу перенести эти тесты в qsync_basic, если возражений по сути
> тестов нет.

Да, лучше бы в basic.

>>> -- Testcase setup.
>>> test_run:switch('default')
>>> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
>>> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>> _ = box.space.sync:create_index('pk')
>>> -- Testcase body.
>>> box.begin() box.space.sync:insert({1}) box.commit()
>>> box.begin() box.space.sync:insert({2}) box.commit()
>>> -- Testcase cleanup.
>>> box.space.sync:drop()
>>>
>>> -- [RFC, summary] switch sync replicas into async ones, expected success and
>>> -- data consistency on a leader and replicas.
>>
>> 10. Это пожалуй пока единственный тест, который тут можно было бы оставить.
>> То есть 'advanced'. Но коммент неверен - нет никаких синхронных реплик.
>> Есть синхронные транзакции. Которые определяются синхронными спейсами.
> 
> RFC: "ability to switch async replicas into sync ones and vice versa"
>                      ^^^^^^^^^^^^^^^^^^^
> В тесте поправлю комментарий. Еще, как я понял, у тебя были возражения
> по поводу того, как делаем выключение синхронной репликации, чтобы она
> стала асинхронной. Или запись в системный спейс это ок?

Пока ок. Потом будет интерфейс нормальный. Сейчас у спейса ничего кроме
формата поменять нельзя нормально. Надо эту проблему решать в общем
случае.

>>> test_run:cmd("setopt delimiter ';'")
>>> _ = fiber.create(function()
>>>     box.space.sync:insert{2}
>>> end);
>>> test_run:cmd("setopt delimiter ''");
>>> -- Disable synchronous mode.
>>> disable_sync_mode()
>>> -- Space is in async mode now.
>>> box.space.sync:insert{3} -- async operation must wait sync one
>>> box.error.injection.set('ERRINJ_SYNC_TIMEOUT', false)
>>> box.space.sync:select{} -- 1
>>> test_run:cmd('switch replica')
>>> box.space.sync:select{} -- 1
>>> -- Testcase cleanup.
>>> test_run:switch('default')
>>> box.space.sync:drop()
>>>
>>> -- Warn user when setting `replication_synchro_quorum` to a value
>>> -- greater than number of instances in a cluster, see gh-5122.
>>> box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning
>>
>> 13. Этот тест походу вообще ничего не проверяет. Варнинг сейчас не пишется,
>> и тест проходит.
> 
> Обычный процесс такой: если тест падает, то, пока есть открытая
> проблема, добавляют XFAIL и при изменении поведения XFAIL меняется на
> XPASS, чтобы убрать XFAIL.  У нас нет такого механизма, поэтому добавил
> тест на будущее и когда варнинг добавят, то тест сломается и result файл
> обновят. Мне кажется, что это вполне себе ок.

Это было бы ок, если бы коммент говорил, что тест пока невалидный. Но что
еще важнее - тест все равно пройдет даже когда добавится ворнинг. Потому
что он пойдет в лог, и в выводе теста его не будет. Так что тест пройдет,
хоть и не должен.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 08/19] replication: add support of qsync to the snapshot machinery
  2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 08/19] replication: add support of qsync to the snapshot machinery Vladislav Shpilevoy
  2020-07-02  8:52     ` Serge Petrenko
@ 2020-07-08 11:43     ` Leonid Vasiliev
  1 sibling, 0 replies; 68+ messages in thread
From: Leonid Vasiliev @ 2020-07-08 11:43 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, sergepetrenko

Add [tosquash] snapshot_delay: remove unnecessary includes

diff --git a/src/box/gc.c b/src/box/gc.c
index 170c0a9..d50a64d 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -57,8 +57,6 @@
  #include "engine.h"            /* engine_collect_garbage() */
  #include "wal.h"               /* wal_collect_garbage() */
  #include "checkpoint_schedule.h"
-#include "trigger.h"
-#include "txn.h"
  #include "txn_limbo.h"

  struct gc_state gc;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication
  2020-07-07 20:57         ` Vladislav Shpilevoy
@ 2020-07-08 12:07           ` Sergey Bronnikov
  2020-07-08 22:13             ` Vladislav Shpilevoy
  0 siblings, 1 reply; 68+ messages in thread
From: Sergey Bronnikov @ 2020-07-08 12:07 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On 22:57 Tue 07 Jul , Vladislav Shpilevoy wrote:
> >>> test_run:switch('default')
> >>> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> >>> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> >>> _ = box.space.sync:create_index('pk')
> >>> -- Testcase body.
> >>> box.space.sync:insert{1}
> >>> box.space.sync:insert{2}
> >>> box.space.sync:insert{3}
> >>> box.space.sync:select{} -- 1, 2, 3
> >>> test_run:switch('replica')
> >>> box.space.sync:select{} -- 1, 2, 3
> >>> -- Testcase cleanup.
> >>> test_run:switch('default')
> >>> box.space.sync:drop()
> >>>
> >>> -- Synchro timeout is not bigger than replication_synchro_timeout value.
> >>> -- Testcase setup.
> >>> test_run:switch('default')
> >>> box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=orig_synchro_timeout}
> >>> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> >>> _ = box.space.sync:create_index('pk')
> >>> -- Testcase body.
> >>> start = os.time()
> >>> box.space.sync:insert{1}
> >>> (os.time() - start) == box.cfg.replication_synchro_timeout -- true
> >>
> >> 7. Очень плохая идея. Если процесс подвиснет тут ненадолго, то эта проверка
> >> упадет. Не должно быть тестов, которые полагаются на то, что процесс будет
> >> выполняться стабильно.
> > 
> > Ты предлагаешь не проверять или есть более надежные способы проверки,
> > что таймаут именно такой величины, каким его выставили?
> 
> Если тебе надо проверить, что таймаут провалился, то надо проверять,
> что прошедшее время >= timeout, но точно не == timeout. Второе очень
> ненадежно.
> 

Мне не нравится эта проверка, потому что тест должен проверять, что "timeout
not bigger than replication_synchro_timeout value".
Сделал так:

box.space.sync:insert{1}
-(os.time() - start) == box.cfg.replication_synchro_timeout -- true
+-- We assume that the process may freeze and the timeout will be slightly
+-- larger than the set value.
+POSSIBLE_ERROR = 2
+(os.time() - start) < box.cfg.replication_synchro_timeout + POSSIBLE_ERROR -- true
 -- Testcase cleanup.

> >>> -- Testcase cleanup.
> >>> test_run:switch('default')
> >>> box.space.sync:drop()
> >>>
> >>> -- replication_synchro_quorum
> >>> test_run:switch('default')
> >>> INT_MIN = -2147483648
> >>> INT_MAX = 2147483648
> >>> box.cfg{replication_synchro_quorum=INT_MAX} -- error
> >>> box.cfg.replication_synchro_quorum -- old value
> >>> box.cfg{replication_synchro_quorum=INT_MIN} -- error
> >>> box.cfg.replication_synchro_quorum -- old value
> >>
> >> 8. Это тоже явно не advanced тесты. Это самые базовые проверки.
> > 
> > Я изначально делал тесты в отдельном файле, чтобы проще было изменять
> > это в общей ветке, без мержей, ребейзов и прочих вещей. Тесты назвались
> > advanced, потомы что должны были покрывать высокоуровневые требования из
> > RFC. Я могу перенести эти тесты в qsync_basic, если возражений по сути
> > тестов нет.
> 
> Да, лучше бы в basic.

Перенес.

> >>> -- Testcase setup.
> >>> test_run:switch('default')
> >>> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> >>> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> >>> _ = box.space.sync:create_index('pk')
> >>> -- Testcase body.
> >>> box.begin() box.space.sync:insert({1}) box.commit()
> >>> box.begin() box.space.sync:insert({2}) box.commit()
> >>> -- Testcase cleanup.
> >>> box.space.sync:drop()
> >>>
> >>> -- [RFC, summary] switch sync replicas into async ones, expected success and
> >>> -- data consistency on a leader and replicas.
> >>
> >> 10. Это пожалуй пока единственный тест, который тут можно было бы оставить.
> >> То есть 'advanced'. Но коммент неверен - нет никаких синхронных реплик.
> >> Есть синхронные транзакции. Которые определяются синхронными спейсами.
> > 
> > RFC: "ability to switch async replicas into sync ones and vice versa"
> >                      ^^^^^^^^^^^^^^^^^^^
> > В тесте поправлю комментарий. Еще, как я понял, у тебя были возражения
> > по поводу того, как делаем выключение синхронной репликации, чтобы она
> > стала асинхронной. Или запись в системный спейс это ок?
> 
> Пока ок. Потом будет интерфейс нормальный. Сейчас у спейса ничего кроме
> формата поменять нельзя нормально. Надо эту проблему решать в общем
> случае.

завел тикет на это https://github.com/tarantool/tarantool/issues/5155

> >>> test_run:cmd("setopt delimiter ';'")
> >>> _ = fiber.create(function()
> >>>     box.space.sync:insert{2}
> >>> end);
> >>> test_run:cmd("setopt delimiter ''");
> >>> -- Disable synchronous mode.
> >>> disable_sync_mode()
> >>> -- Space is in async mode now.
> >>> box.space.sync:insert{3} -- async operation must wait sync one
> >>> box.error.injection.set('ERRINJ_SYNC_TIMEOUT', false)
> >>> box.space.sync:select{} -- 1
> >>> test_run:cmd('switch replica')
> >>> box.space.sync:select{} -- 1
> >>> -- Testcase cleanup.
> >>> test_run:switch('default')
> >>> box.space.sync:drop()
> >>>
> >>> -- Warn user when setting `replication_synchro_quorum` to a value
> >>> -- greater than number of instances in a cluster, see gh-5122.
> >>> box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning
> >>
> >> 13. Этот тест походу вообще ничего не проверяет. Варнинг сейчас не пишется,
> >> и тест проходит.
> > 
> > Обычный процесс такой: если тест падает, то, пока есть открытая
> > проблема, добавляют XFAIL и при изменении поведения XFAIL меняется на
> > XPASS, чтобы убрать XFAIL.  У нас нет такого механизма, поэтому добавил
> > тест на будущее и когда варнинг добавят, то тест сломается и result файл
> > обновят. Мне кажется, что это вполне себе ок.
> 
> Это было бы ок, если бы коммент говорил, что тест пока невалидный. Но что
> еще важнее - тест все равно пройдет даже когда добавится ворнинг. Потому
> что он пойдет в лог, и в выводе теста его не будет. Так что тест пройдет,
> хоть и не должен.

У нас автоматические тесты с бинарным статусом PASS или FAIL и человек
обычно смотрит результат выполнения тестов, а не комментарии в
исходнике. Поэтому это не сильно меняет дело. Но я обновил комментарий:

 -- greater than number of instances in a cluster, see gh-5122.
 -box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning
 +box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- expected warning, to be add in gh-5122

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication
  2020-07-08 12:07           ` Sergey Bronnikov
@ 2020-07-08 22:13             ` Vladislav Shpilevoy
  2020-07-09  9:39               ` Sergey Bronnikov
  0 siblings, 1 reply; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-08 22:13 UTC (permalink / raw)
  To: Sergey Bronnikov; +Cc: tarantool-patches

On 08/07/2020 14:07, Sergey Bronnikov wrote:
> On 22:57 Tue 07 Jul , Vladislav Shpilevoy wrote:
>>>>> test_run:switch('default')
>>>>> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
>>>>> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>>>> _ = box.space.sync:create_index('pk')
>>>>> -- Testcase body.
>>>>> box.space.sync:insert{1}
>>>>> box.space.sync:insert{2}
>>>>> box.space.sync:insert{3}
>>>>> box.space.sync:select{} -- 1, 2, 3
>>>>> test_run:switch('replica')
>>>>> box.space.sync:select{} -- 1, 2, 3
>>>>> -- Testcase cleanup.
>>>>> test_run:switch('default')
>>>>> box.space.sync:drop()
>>>>>
>>>>> -- Synchro timeout is not bigger than replication_synchro_timeout value.
>>>>> -- Testcase setup.
>>>>> test_run:switch('default')
>>>>> box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=orig_synchro_timeout}
>>>>> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>>>> _ = box.space.sync:create_index('pk')
>>>>> -- Testcase body.
>>>>> start = os.time()
>>>>> box.space.sync:insert{1}
>>>>> (os.time() - start) == box.cfg.replication_synchro_timeout -- true
>>>>
>>>> 7. Очень плохая идея. Если процесс подвиснет тут ненадолго, то эта проверка
>>>> упадет. Не должно быть тестов, которые полагаются на то, что процесс будет
>>>> выполняться стабильно.
>>>
>>> Ты предлагаешь не проверять или есть более надежные способы проверки,
>>> что таймаут именно такой величины, каким его выставили?
>>
>> Если тебе надо проверить, что таймаут провалился, то надо проверять,
>> что прошедшее время >= timeout, но точно не == timeout. Второе очень
>> ненадежно.
>>
> 
> Мне не нравится эта проверка, потому что тест должен проверять, что "timeout
> not bigger than replication_synchro_timeout value".
> Сделал так:
> 
> box.space.sync:insert{1}
> -(os.time() - start) == box.cfg.replication_synchro_timeout -- true
> +-- We assume that the process may freeze and the timeout will be slightly
> +-- larger than the set value.
> +POSSIBLE_ERROR = 2
> +(os.time() - start) < box.cfg.replication_synchro_timeout + POSSIBLE_ERROR -- true
>  -- Testcase cleanup.

Во-первых, твой тест как раз проверяет падение таймаута. Потому что ты
пытаешься писать с BROKEN_QUORUM. Так что проверка походу неверна.

Во-вторых, даже если бы кворум был, то хак с добавкой нескольких секунд
тоже не дает гарантий, а значит тест становится flaky. Не надо так делать,
пожалуйста.

>>>>> test_run:cmd("setopt delimiter ';'")
>>>>> _ = fiber.create(function()
>>>>>     box.space.sync:insert{2}
>>>>> end);
>>>>> test_run:cmd("setopt delimiter ''");
>>>>> -- Disable synchronous mode.
>>>>> disable_sync_mode()
>>>>> -- Space is in async mode now.
>>>>> box.space.sync:insert{3} -- async operation must wait sync one
>>>>> box.error.injection.set('ERRINJ_SYNC_TIMEOUT', false)
>>>>> box.space.sync:select{} -- 1
>>>>> test_run:cmd('switch replica')
>>>>> box.space.sync:select{} -- 1
>>>>> -- Testcase cleanup.
>>>>> test_run:switch('default')
>>>>> box.space.sync:drop()
>>>>>
>>>>> -- Warn user when setting `replication_synchro_quorum` to a value
>>>>> -- greater than number of instances in a cluster, see gh-5122.
>>>>> box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning
>>>>
>>>> 13. Этот тест походу вообще ничего не проверяет. Варнинг сейчас не пишется,
>>>> и тест проходит.
>>>
>>> Обычный процесс такой: если тест падает, то, пока есть открытая
>>> проблема, добавляют XFAIL и при изменении поведения XFAIL меняется на
>>> XPASS, чтобы убрать XFAIL.  У нас нет такого механизма, поэтому добавил
>>> тест на будущее и когда варнинг добавят, то тест сломается и result файл
>>> обновят. Мне кажется, что это вполне себе ок.
>>
>> Это было бы ок, если бы коммент говорил, что тест пока невалидный. Но что
>> еще важнее - тест все равно пройдет даже когда добавится ворнинг. Потому
>> что он пойдет в лог, и в выводе теста его не будет. Так что тест пройдет,
>> хоть и не должен.
> 
> У нас автоматические тесты с бинарным статусом PASS или FAIL и человек
> обычно смотрит результат выполнения тестов, а не комментарии в
> исходнике. Поэтому это не сильно меняет дело. Но я обновил комментарий:
> 
>  -- greater than number of instances in a cluster, see gh-5122.
>  -box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning
>  +box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- expected warning, to be add in gh-5122

Ты видимо не прочитал, что я написал(. Печать ворнинга не изменит вообще
ничего. Ворнинги - это логи, они не идут в дифф. Они будут в лог файле,
а не в выводе теста. И этот тест все равно пройдет.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication
  2020-07-08 22:13             ` Vladislav Shpilevoy
@ 2020-07-09  9:39               ` Sergey Bronnikov
  0 siblings, 0 replies; 68+ messages in thread
From: Sergey Bronnikov @ 2020-07-09  9:39 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On 00:13 Thu 09 Jul , Vladislav Shpilevoy wrote:
> On 08/07/2020 14:07, Sergey Bronnikov wrote:
> > On 22:57 Tue 07 Jul , Vladislav Shpilevoy wrote:
> >>>>> test_run:switch('default')
> >>>>> box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
> >>>>> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> >>>>> _ = box.space.sync:create_index('pk')
> >>>>> -- Testcase body.
> >>>>> box.space.sync:insert{1}
> >>>>> box.space.sync:insert{2}
> >>>>> box.space.sync:insert{3}
> >>>>> box.space.sync:select{} -- 1, 2, 3
> >>>>> test_run:switch('replica')
> >>>>> box.space.sync:select{} -- 1, 2, 3
> >>>>> -- Testcase cleanup.
> >>>>> test_run:switch('default')
> >>>>> box.space.sync:drop()
> >>>>>
> >>>>> -- Synchro timeout is not bigger than replication_synchro_timeout value.
> >>>>> -- Testcase setup.
> >>>>> test_run:switch('default')
> >>>>> box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=orig_synchro_timeout}
> >>>>> _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> >>>>> _ = box.space.sync:create_index('pk')
> >>>>> -- Testcase body.
> >>>>> start = os.time()
> >>>>> box.space.sync:insert{1}
> >>>>> (os.time() - start) == box.cfg.replication_synchro_timeout -- true
> >>>>
> >>>> 7. Очень плохая идея. Если процесс подвиснет тут ненадолго, то эта проверка
> >>>> упадет. Не должно быть тестов, которые полагаются на то, что процесс будет
> >>>> выполняться стабильно.
> >>>
> >>> Ты предлагаешь не проверять или есть более надежные способы проверки,
> >>> что таймаут именно такой величины, каким его выставили?
> >>
> >> Если тебе надо проверить, что таймаут провалился, то надо проверять,
> >> что прошедшее время >= timeout, но точно не == timeout. Второе очень
> >> ненадежно.
> >>
> > 
> > Мне не нравится эта проверка, потому что тест должен проверять, что "timeout
> > not bigger than replication_synchro_timeout value".
> > Сделал так:
> > 
> > box.space.sync:insert{1}
> > -(os.time() - start) == box.cfg.replication_synchro_timeout -- true
> > +-- We assume that the process may freeze and the timeout will be slightly
> > +-- larger than the set value.
> > +POSSIBLE_ERROR = 2
> > +(os.time() - start) < box.cfg.replication_synchro_timeout + POSSIBLE_ERROR -- true
> >  -- Testcase cleanup.
> 
> Во-первых, твой тест как раз проверяет падение таймаута. Потому что ты
> пытаешься писать с BROKEN_QUORUM. Так что проверка походу неверна.
> 
> Во-вторых, даже если бы кворум был, то хак с добавкой нескольких секунд
> тоже не дает гарантий, а значит тест становится flaky. Не надо так делать,
> пожалуйста.

Сделал проверку, что таймаут не меньше указанного значения:

--- Synchro timeout is not bigger than replication_synchro_timeout value.
+-- Synchro timeout is not less than replication_synchro_timeout value.
 -- Testcase setup.
 test_run:switch('default')
 box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=orig_synchro_timeout}
@@ -51,10 +52,7 @@ _ = box.space.sync:create_index('pk')
 -- Testcase body.
 start = os.time()
 box.space.sync:insert{1}
--- We assume that the process may freeze and the timeout will be slightly
--- larger than the set value.
-POSSIBLE_ERROR = 2
-(os.time() - start) < box.cfg.replication_synchro_timeout + POSSIBLE_ERROR -- true
+(os.time() - start) >= box.cfg.replication_synchro_timeout -- true
 -- Testcase cleanup.

> >>>>> test_run:cmd("setopt delimiter ';'")
> >>>>> _ = fiber.create(function()
> >>>>>     box.space.sync:insert{2}
> >>>>> end);
> >>>>> test_run:cmd("setopt delimiter ''");
> >>>>> -- Disable synchronous mode.
> >>>>> disable_sync_mode()
> >>>>> -- Space is in async mode now.
> >>>>> box.space.sync:insert{3} -- async operation must wait sync one
> >>>>> box.error.injection.set('ERRINJ_SYNC_TIMEOUT', false)
> >>>>> box.space.sync:select{} -- 1
> >>>>> test_run:cmd('switch replica')
> >>>>> box.space.sync:select{} -- 1
> >>>>> -- Testcase cleanup.
> >>>>> test_run:switch('default')
> >>>>> box.space.sync:drop()
> >>>>>
> >>>>> -- Warn user when setting `replication_synchro_quorum` to a value
> >>>>> -- greater than number of instances in a cluster, see gh-5122.
> >>>>> box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning
> >>>>
> >>>> 13. Этот тест походу вообще ничего не проверяет. Варнинг сейчас не пишется,
> >>>> и тест проходит.
> >>>
> >>> Обычный процесс такой: если тест падает, то, пока есть открытая
> >>> проблема, добавляют XFAIL и при изменении поведения XFAIL меняется на
> >>> XPASS, чтобы убрать XFAIL.  У нас нет такого механизма, поэтому добавил
> >>> тест на будущее и когда варнинг добавят, то тест сломается и result файл
> >>> обновят. Мне кажется, что это вполне себе ок.
> >>
> >> Это было бы ок, если бы коммент говорил, что тест пока невалидный. Но что
> >> еще важнее - тест все равно пройдет даже когда добавится ворнинг. Потому
> >> что он пойдет в лог, и в выводе теста его не будет. Так что тест пройдет,
> >> хоть и не должен.
> > 
> > У нас автоматические тесты с бинарным статусом PASS или FAIL и человек
> > обычно смотрит результат выполнения тестов, а не комментарии в
> > исходнике. Поэтому это не сильно меняет дело. Но я обновил комментарий:
> > 
> >  -- greater than number of instances in a cluster, see gh-5122.
> >  -box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- warning
> >  +box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- expected warning, to be add in gh-5122
> 
> Ты видимо не прочитал, что я написал(. Печать ворнинга не изменит вообще
> ничего. Ворнинги - это логи, они не идут в дифф. Они будут в лог файле,
> а не в выводе теста. И этот тест все равно пройдет.

Да, я тебя не так понял.
Тогда можно так, но grep_log() нужно будет потом раскомментировать:

--- greater than number of instances in a cluster, see gh-5122.
-box.cfg{replication_synchro_quorum=BROKEN_QUORUM} -- expected warning, to be add in gh-5122
+-- should be greater than number of instances in a cluster, see gh-5122.
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM}
+-- expected warning, to be add in gh-5122
+--while test_run:grep_log('default', 'warning: .*') == nil do fiber.sleep(0.01) end

-- 
sergeyb@

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 00/19] Sync replication
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (25 preceding siblings ...)
  2020-07-06 23:31   ` [Tarantool-patches] [PATCH] Add new error injection constant ERRINJ_SYNC_TIMEOUT Vladislav Shpilevoy
@ 2020-07-10  0:50   ` Vladislav Shpilevoy
  2020-07-10  7:40   ` Kirill Yukhin
  27 siblings, 0 replies; 68+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-10  0:50 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Here is a pack of final fixes before the branch goes to master.
Lots of them, but I tried to explain them individually. Not point
in making them separate commits since anyway they are all squashed
into the older commits.

================================================================================
diff --git a/src/box/txn.c b/src/box/txn.c
index ffc2ac6a5..a2df23833 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -749,7 +749,8 @@ txn_commit_async(struct txn *txn)
 
 		if (txn_has_flag(txn, TXN_WAIT_ACK)) {
 			int64_t lsn = req->rows[txn->n_applier_rows - 1]->lsn;
-			txn_limbo_assign_lsn(&txn_limbo, limbo_entry, lsn);
+			txn_limbo_assign_remote_lsn(&txn_limbo, limbo_entry,
+						    lsn);
 		}
 
 		/*
@@ -836,7 +837,8 @@ txn_commit(struct txn *txn)
 	if (is_sync) {
 		if (txn_has_flag(txn, TXN_WAIT_ACK)) {
 			int64_t lsn = req->rows[req->n_rows - 1]->lsn;
-			txn_limbo_assign_lsn(&txn_limbo, limbo_entry, lsn);
+			txn_limbo_assign_local_lsn(&txn_limbo, limbo_entry,
+						   lsn);
 			/* Local WAL write is a first 'ACK'. */
 			txn_limbo_ack(&txn_limbo, txn_limbo.instance_id, lsn);
 		}
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 71a47802a..e28432bfd 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -87,8 +87,7 @@ static inline void
 txn_limbo_remove(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 {
 	assert(!rlist_empty(&entry->in_queue));
-	assert(rlist_first_entry(&limbo->queue, struct txn_limbo_entry,
-				 in_queue) == entry);
+	assert(txn_limbo_first_entry(limbo) == entry);
 	(void) limbo;
 	rlist_del_entry(entry, in_queue);
 }
@@ -97,8 +96,7 @@ static inline void
 txn_limbo_pop(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 {
 	assert(!rlist_empty(&entry->in_queue));
-	assert(rlist_last_entry(&limbo->queue, struct txn_limbo_entry,
-				in_queue) == entry);
+	assert(txn_limbo_last_entry(limbo) == entry);
 	assert(entry->is_rollback);
 	(void) limbo;
 	rlist_del_entry(entry, in_queue);
@@ -119,10 +117,11 @@ txn_limbo_abort(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 }
 
 void
-txn_limbo_assign_lsn(struct txn_limbo *limbo, struct txn_limbo_entry *entry,
-		     int64_t lsn)
+txn_limbo_assign_remote_lsn(struct txn_limbo *limbo,
+			    struct txn_limbo_entry *entry, int64_t lsn)
 {
 	assert(limbo->instance_id != REPLICA_ID_NIL);
+	assert(limbo->instance_id != instance_id);
 	assert(entry->lsn == -1);
 	assert(lsn > 0);
 	assert(txn_has_flag(entry->txn, TXN_WAIT_ACK));
@@ -130,27 +129,30 @@ txn_limbo_assign_lsn(struct txn_limbo *limbo, struct txn_limbo_entry *entry,
 	entry->lsn = lsn;
 }
 
-static bool
-txn_limbo_check_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
+void
+txn_limbo_assign_local_lsn(struct txn_limbo *limbo,
+			   struct txn_limbo_entry *entry, int64_t lsn)
 {
-	if (txn_limbo_entry_is_complete(entry))
-		return true;
+	assert(limbo->instance_id != REPLICA_ID_NIL);
+	assert(limbo->instance_id == instance_id);
+	assert(entry->lsn == -1);
+	assert(lsn > 0);
+	assert(txn_has_flag(entry->txn, TXN_WAIT_ACK));
+	(void) limbo;
+	entry->lsn = lsn;
 	/*
-	 * Async transaction can't complete itself. It is always
-	 * completed by a previous sync transaction.
+	 * The entry just got its LSN after a WAL write. It could
+	 * happen that this LSN was already ACKed by some
+	 * replicas. Update the ACK counter to take them into
+	 * account.
 	 */
-	if (!txn_has_flag(entry->txn, TXN_WAIT_ACK))
-		return false;
 	struct vclock_iterator iter;
 	vclock_iterator_init(&iter, &limbo->vclock);
 	int ack_count = 0;
-	int64_t lsn = entry->lsn;
 	vclock_foreach(&iter, vc)
 		ack_count += vc.lsn >= lsn;
 	assert(ack_count >= entry->ack_count);
 	entry->ack_count = ack_count;
-	entry->is_commit = ack_count >= replication_synchro_quorum;
-	return entry->is_commit;
================================================================================

The changes above are motivated by a bug I found during stress
testing (by running the same test in parallel in test-run in 10-15
processes).

The bug was a crash happening in case a transaction was replicated
and ACKed earlier than WAL thread responded ok to TX thread. Then
CONFIRM wasn't written at all.

================================================================================
 }
 
 static int
@@ -161,7 +163,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 {
 	struct txn *txn = entry->txn;
 	assert(entry->lsn > 0 || !txn_has_flag(entry->txn, TXN_WAIT_ACK));
-	if (txn_limbo_check_complete(limbo, entry))
+	if (txn_limbo_entry_is_complete(entry))
 		goto complete;
 
 	assert(!txn_has_flag(txn, TXN_IS_DONE));
@@ -226,7 +229,26 @@ complete:
 		diag_set(ClientError, ER_SYNC_ROLLBACK);
 		return -1;
 	}
-	txn_limbo_remove(limbo, entry);
+	/*
+	 * The entry might be not the first in the limbo. It
+	 * happens when there was a sync transaction and async
+	 * transaction. The sync and async went to WAL. After sync
+	 * WAL write is done, it may be already ACKed by the
+	 * needed replica count. Now it marks self as committed
+	 * and does the same for the next async txn. Then it
+	 * starts writing CONFIRM. During that the async
+	 * transaction finishes its WAL write, sees it is
+	 * committed and ends up here. Not being the first
+	 * transaction in the limbo.
+	 */
+	while (!rlist_empty(&entry->in_queue) &&
+	       txn_limbo_first_entry(limbo) != entry) {
+		bool cancellable = fiber_set_cancellable(false);
+		fiber_yield();
+		fiber_set_cancellable(cancellable);
+	}
+	if (!rlist_empty(&entry->in_queue))
+		txn_limbo_remove(limbo, entry);
 	txn_clear_flag(txn, TXN_WAIT_SYNC);
 	txn_clear_flag(txn, TXN_WAIT_ACK);
 	return 0;
@@ -257,7 +279,7 @@ txn_limbo_write_confirm_rollback(struct txn_limbo *limbo, int64_t lsn,
 		 * the last "safe" lsn is lsn - 1.
 		 */
 		res = xrow_encode_rollback(&row, &txn->region,
-					   limbo->instance_id, lsn - 1);
+					   limbo->instance_id, lsn);
================================================================================

I asked Sergey to do that, but he temporarily left. There wasn't no
a bug, just inconsistency. For CONFIRM we use inclusive LSN - it commits
all <= LSN. But for ROLLBACK we used exclusive LSN - it rolls back all > LSN.
This is strange. So I made ROLLBACK LSN inclusive too. This is one step
towards https://github.com/tarantool/tarantool/issues/5151.

================================================================================
 	}
 	if (res == -1)
 		goto rollback;
@@ -342,7 +364,7 @@ txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
 	rlist_foreach_entry_reverse(e, &limbo->queue, in_queue) {
 		if (!txn_has_flag(e->txn, TXN_WAIT_ACK))
 			continue;
-		if (e->lsn <= lsn)
+		if (e->lsn < lsn)
 			break;
 		last_rollback = e;
 	}
@@ -542,7 +564,7 @@ txn_limbo_force_empty(struct txn_limbo *limbo, int64_t confirm_lsn)
 	}
 	if (rollback != NULL) {
 		txn_limbo_write_rollback(limbo, rollback->lsn);
-		txn_limbo_read_rollback(limbo, rollback->lsn - 1);
+		txn_limbo_read_rollback(limbo, rollback->lsn);
 	}
 }
 
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index 1ee416231..88614d4a6 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -158,13 +158,21 @@ void
 txn_limbo_abort(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 
 /**
- * Assign local LSN to the limbo entry. That happens when the
- * transaction is added to the limbo, writes to WAL, and gets an
- * LSN.
+ * Assign a remote LSN to a limbo entry. That happens when a
+ * remote transaction is added to the limbo and starts waiting for
+ * a confirm.
  */
 void
-txn_limbo_assign_lsn(struct txn_limbo *limbo, struct txn_limbo_entry *entry,
-		     int64_t lsn);
+txn_limbo_assign_remote_lsn(struct txn_limbo *limbo,
+			    struct txn_limbo_entry *entry, int64_t lsn);
+
+/**
+ * Assign a local LSN to a limbo entry. That happens when a local
+ * transaction is written to WAL.
+ */
+void
+txn_limbo_assign_local_lsn(struct txn_limbo *limbo,
+			   struct txn_limbo_entry *entry, int64_t lsn);
 
 /**
  * Ack all transactions up to the given LSN on behalf of the
diff --git a/src/box/xrow.h b/src/box/xrow.h
index 7e6a4aceb..b325213e6 100644
--- a/src/box/xrow.h
+++ b/src/box/xrow.h
@@ -246,7 +246,7 @@ xrow_decode_confirm(struct xrow_header *row, uint32_t *replica_id, int64_t *lsn)
  * @param row xrow header.
  * @param region Region to use to encode the rollback body.
  * @param replica_id master's instance id.
- * @param lsn lsn to rollback to.
+ * @param lsn lsn to rollback from, including it.
  * @retval -1  on error.
  * @retval 0 success.
  */
diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
index 6d1624798..6b55a0e5e 100644
--- a/test/replication/qsync_basic.result
+++ b/test/replication/qsync_basic.result
@@ -199,7 +199,7 @@ box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
 -- Commit something non-sync. So as applier writer fiber would
 -- flush the pending heartbeat and go to sleep with the new huge
 -- replication timeout.
-s = box.schema.create_space('test')
+s = box.schema.create_space('test', {engine = engine})
  | ---
  | ...
 pk = s:create_index('pk')
@@ -309,7 +309,7 @@ test_run:switch('default')
 box.cfg{replication_synchro_timeout = 1000, replication_synchro_quorum = 2}
  | ---
  | ...
-_ = box.schema.create_space('locallocal', {is_local = true})
+_ = box.schema.create_space('locallocal', {is_local = true, engine = engine})
  | ---
  | ...
 _ = _:create_index('pk')
@@ -551,6 +551,9 @@ test_run:switch('default')
  | ---
  | - true
  | ...
+box.cfg{replication_synchro_timeout = 1000}
================================================================================

There was used timeout from the previous testcase, < 1 second. Was
flaky.

================================================================================
+ | ---
+ | ...
 ok, err = nil
  | ---
  | ...
diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
index 384b3593c..dcd1d6c76 100644
--- a/test/replication/qsync_basic.test.lua
+++ b/test/replication/qsync_basic.test.lua
@@ -83,7 +83,7 @@ box.cfg{replication_timeout = 1000, replication_synchro_timeout = 1000}
 -- Commit something non-sync. So as applier writer fiber would
 -- flush the pending heartbeat and go to sleep with the new huge
 -- replication timeout.
-s = box.schema.create_space('test')
+s = box.schema.create_space('test', {engine = engine})
 pk = s:create_index('pk')
 s:replace{1}
 -- Now commit something sync. It should return immediately even
@@ -123,7 +123,7 @@ box.space.sync:select{6}
 --
 test_run:switch('default')
 box.cfg{replication_synchro_timeout = 1000, replication_synchro_quorum = 2}
-_ = box.schema.create_space('locallocal', {is_local = true})
+_ = box.schema.create_space('locallocal', {is_local = true, engine = engine})
 _ = _:create_index('pk')
 -- Propagate local vclock to some insane value to ensure it won't
 -- affect anything.
@@ -217,6 +217,7 @@ box.space.sync:select{11}
 
 -- Test it is possible to early ACK a transaction with a new quorum.
 test_run:switch('default')
+box.cfg{replication_synchro_timeout = 1000}
 ok, err = nil
 f = fiber.create(function()                                                     \
     ok, err = pcall(box.space.sync.insert, box.space.sync, {12})                \
diff --git a/test/replication/qsync_snapshots.result b/test/replication/qsync_snapshots.result
index 61cb7164b..2a126087a 100644
--- a/test/replication/qsync_snapshots.result
+++ b/test/replication/qsync_snapshots.result
@@ -48,7 +48,7 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
================================================================================

Too small timeout. The test assumed it doesn't fail, but 0.1 is quite
easy to fail. Especially when runs in parallel. The same for some other
fixes below.

================================================================================
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -86,7 +86,7 @@ test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -112,58 +112,9 @@ box.space.sync:select{} -- 1
  | ---
  | - - [1]
  | ...
-box.snapshot()
- | ---
- | - ok
- | ...
-box.space.sync:select{} -- 1
- | ---
- | - - [1]
- | ...
--- Testcase cleanup.
-test_run:switch('default')
- | ---
- | - true
- | ...
-box.space.sync:drop()
+box.cfg{replication_synchro_timeout=1000}
  | ---
  | ...
-
--- [RFC, Snapshot generation] rolled back operations are not snapshotted.
--- Testcase setup.
-test_run:switch('default')
- | ---
- | - true
- | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
- | ---
- | ...
-_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
- | ---
- | ...
-_ = box.space.sync:create_index('pk')
- | ---
- | ...
--- Testcase body.
-box.space.sync:insert{1}
- | ---
- | - [1]
- | ...
-box.space.sync:select{} -- 1
- | ---
- | - - [1]
- | ...
-test_run:switch('default')
- | ---
- | - true
- | ...
-box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=0.1}
- | ---
- | ...
-box.space.sync:insert{2}
- | ---
- | - error: Quorum collection for a synchronous transaction is timed out
- | ...
 box.snapshot()
  | ---
  | - ok
@@ -172,14 +123,6 @@ box.space.sync:select{} -- 1
  | ---
  | - - [1]
  | ...
-test_run:switch('replica')
- | ---
- | - true
- | ...
-box.space.sync:select{} -- 1
- | ---
- | - - [1]
- | ...
 -- Testcase cleanup.
 test_run:switch('default')
  | ---
@@ -191,11 +134,40 @@ box.space.sync:drop()
 
 -- [RFC, Snapshot generation] snapshot started on master, then rollback
 -- arrived, expected snapshot abort.
+-- The test is temporary blocked on 5146 due to a crash when local
+-- WAL write fails inside the WAL thread. Since this is the only
+-- way to cause rollback of the transaction used in a snapshot
+-- without triggering snapshot timeout.
+
+-- test_run:switch('default')
+-- box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+-- _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+-- _ = box.space.sync:create_index('pk')
+-- -- Testcase body.
+-- box.space.sync:insert{1}
+-- box.space.sync:select{} -- 1
+-- test_run:switch('default')
+-- test_run:cmd("setopt delimiter ';'")
+-- _ = fiber.create(function()
+--     box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=2}
+--     box.space.sync:insert{2}
+-- end);
+-- test_run:cmd("setopt delimiter ''");
+-- box.snapshot() -- abort
+-- box.space.sync:select{} -- 1
+-- test_run:switch('replica')
+-- box.space.sync:select{} -- 1
+-- -- Testcase cleanup.
+-- test_run:switch('default')
+-- box.space.sync:drop()
+
+-- [RFC, Snapshot generation] snapshot started on replica, then rollback
+-- arrived, expected snapshot abort.
 test_run:switch('default')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
  | ---
  | ...
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
@@ -204,128 +176,85 @@ _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
  | ---
  | ...
+
 -- Testcase body.
-box.space.sync:insert{1}
- | ---
- | - [1]
- | ...
-box.space.sync:select{} -- 1
- | ---
- | - - [1]
- | ...
 test_run:switch('default')
  | ---
  | - true
  | ...
-test_run:cmd("setopt delimiter ';'")
- | ---
- | - true
- | ...
-_ = fiber.create(function()
-    box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=2}
-    box.space.sync:insert{2}
-end);
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM}
================================================================================

2 seconds was too long and was flaky. I made it faster and more stable using
event-oriented instead of time oriented things.

================================================================================
  | ---
  | ...
-test_run:cmd("setopt delimiter ''");
+ok, err = nil
  | ---
- | - true
- | ...
-box.snapshot() -- abort
- | ---
- | - error: A rollback for a synchronous transaction is received
  | ...
-box.space.sync:select{} -- 1
- | ---
- | - - [1]
- | ...
-test_run:switch('replica')
- | ---
- | - true
- | ...
-box.space.sync:select{} -- 1
- | ---
- | - - [1]
- | ...
--- Testcase cleanup.
-test_run:switch('default')
- | ---
- | - true
- | ...
-box.space.sync:drop()
+f = fiber.create(function()                                                     \
+    ok, err = pcall(box.space.sync.insert, box.space.sync, {1})                 \
+end)
  | ---
  | ...
 
--- [RFC, Snapshot generation] snapshot started on replica, then rollback
--- arrived, expected snapshot abort.
-test_run:switch('default')
+test_run:switch('replica')
  | ---
  | - true
  | ...
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+fiber = require('fiber')
  | ---
  | ...
-_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+box.cfg{replication_synchro_timeout=1000}
  | ---
  | ...
-_ = box.space.sync:create_index('pk')
+ok, err = nil
  | ---
  | ...
--- Testcase body.
-box.space.sync:insert{1}
+f = fiber.create(function() ok, err = pcall(box.snapshot) end)
  | ---
- | - [1]
  | ...
-box.space.sync:select{} -- 1
- | ---
- | - - [1]
- | ...
-test_run:switch('replica')
+
+test_run:switch('default')
  | ---
  | - true
  | ...
-box.space.sync:select{} -- 1
- | ---
- | - - [1]
- | ...
-test_run:switch('default')
+box.cfg{replication_synchro_timeout=0.0001}
  | ---
- | - true
  | ...
-test_run:cmd("setopt delimiter ';'")
+test_run:wait_cond(function() return f:status() == 'dead' end)
  | ---
  | - true
  | ...
-_ = fiber.create(function()
-    box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=2}
-    box.space.sync:insert{2}
-end);
+ok, err
  | ---
+ | - false
+ | - Quorum collection for a synchronous transaction is timed out
  | ...
-test_run:cmd("setopt delimiter ''");
+
+test_run:switch('replica')
  | ---
  | - true
  | ...
-test_run:switch('replica')
+test_run:wait_cond(function() return f:status() == 'dead' end)
  | ---
  | - true
  | ...
-box.snapshot() -- abort
+ok, err
  | ---
- | - error: A rollback for a synchronous transaction is received
+ | - false
+ | - A rollback for a synchronous transaction is received
  | ...
-box.space.sync:select{} -- 1
+box.space.sync:select{}
  | ---
- | - - [1]
+ | - []
  | ...
+
 test_run:switch('default')
  | ---
  | - true
  | ...
-box.space.sync:select{} -- 1
+box.space.sync:select{}
  | ---
- | - - [1]
+ | - []
  | ...
+
 -- Testcase cleanup.
 test_run:switch('default')
  | ---
diff --git a/test/replication/qsync_snapshots.test.lua b/test/replication/qsync_snapshots.test.lua
index b5990bce7..0db61da95 100644
--- a/test/replication/qsync_snapshots.test.lua
+++ b/test/replication/qsync_snapshots.test.lua
@@ -20,7 +20,7 @@ test_run:cmd('start server replica with wait=True, wait_load=True')
 -- expected success.
 -- Testcase setup.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
 -- Testcase body.
@@ -35,7 +35,7 @@ box.space.sync:drop()
 -- expected success.
 -- Testcase setup.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
 -- Testcase body.
@@ -43,79 +43,76 @@ box.space.sync:insert{1}
 box.space.sync:select{} -- 1
 test_run:switch('replica')
 box.space.sync:select{} -- 1
+box.cfg{replication_synchro_timeout=1000}
 box.snapshot()
 box.space.sync:select{} -- 1
 -- Testcase cleanup.
 test_run:switch('default')
 box.space.sync:drop()
 
--- [RFC, Snapshot generation] rolled back operations are not snapshotted.
--- Testcase setup.
-test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
-_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
-_ = box.space.sync:create_index('pk')
--- Testcase body.
-box.space.sync:insert{1}
-box.space.sync:select{} -- 1
-test_run:switch('default')
-box.cfg{replication_synchro_quorum=3, replication_synchro_timeout=0.1}
-box.space.sync:insert{2}
-box.snapshot()
-box.space.sync:select{} -- 1
-test_run:switch('replica')
-box.space.sync:select{} -- 1
--- Testcase cleanup.
-test_run:switch('default')
-box.space.sync:drop()
-
 -- [RFC, Snapshot generation] snapshot started on master, then rollback
 -- arrived, expected snapshot abort.
-test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
-_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
-_ = box.space.sync:create_index('pk')
--- Testcase body.
-box.space.sync:insert{1}
-box.space.sync:select{} -- 1
-test_run:switch('default')
-test_run:cmd("setopt delimiter ';'")
-_ = fiber.create(function()
-    box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=2}
-    box.space.sync:insert{2}
-end);
-test_run:cmd("setopt delimiter ''");
-box.snapshot() -- abort
-box.space.sync:select{} -- 1
-test_run:switch('replica')
-box.space.sync:select{} -- 1
--- Testcase cleanup.
-test_run:switch('default')
-box.space.sync:drop()
+-- The test is temporary blocked on 5146 due to a crash when local
+-- WAL write fails inside the WAL thread. Since this is the only
+-- way to cause rollback of the transaction used in a snapshot
+-- without triggering snapshot timeout.
+
+-- test_run:switch('default')
+-- box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+-- _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+-- _ = box.space.sync:create_index('pk')
+-- -- Testcase body.
+-- box.space.sync:insert{1}
+-- box.space.sync:select{} -- 1
+-- test_run:switch('default')
+-- test_run:cmd("setopt delimiter ';'")
+-- _ = fiber.create(function()
+--     box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=2}
+--     box.space.sync:insert{2}
+-- end);
+-- test_run:cmd("setopt delimiter ''");
+-- box.snapshot() -- abort
+-- box.space.sync:select{} -- 1
+-- test_run:switch('replica')
+-- box.space.sync:select{} -- 1
+-- -- Testcase cleanup.
+-- test_run:switch('default')
+-- box.space.sync:drop()
 
 -- [RFC, Snapshot generation] snapshot started on replica, then rollback
 -- arrived, expected snapshot abort.
 test_run:switch('default')
-box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=0.1}
+box.cfg{replication_synchro_quorum=NUM_INSTANCES, replication_synchro_timeout=1000}
 _ = box.schema.space.create('sync', {is_sync=true, engine=engine})
 _ = box.space.sync:create_index('pk')
+
 -- Testcase body.
-box.space.sync:insert{1}
-box.space.sync:select{} -- 1
+test_run:switch('default')
+box.cfg{replication_synchro_quorum=BROKEN_QUORUM}
+ok, err = nil
+f = fiber.create(function()                                                     \
+    ok, err = pcall(box.space.sync.insert, box.space.sync, {1})                 \
+end)
+
 test_run:switch('replica')
-box.space.sync:select{} -- 1
+fiber = require('fiber')
+box.cfg{replication_synchro_timeout=1000}
+ok, err = nil
+f = fiber.create(function() ok, err = pcall(box.snapshot) end)
+
 test_run:switch('default')
-test_run:cmd("setopt delimiter ';'")
-_ = fiber.create(function()
-    box.cfg{replication_synchro_quorum=BROKEN_QUORUM, replication_synchro_timeout=2}
-    box.space.sync:insert{2}
-end);
-test_run:cmd("setopt delimiter ''");
+box.cfg{replication_synchro_timeout=0.0001}
+test_run:wait_cond(function() return f:status() == 'dead' end)
+ok, err
+
 test_run:switch('replica')
-box.snapshot() -- abort
-box.space.sync:select{} -- 1
+test_run:wait_cond(function() return f:status() == 'dead' end)
+ok, err
+box.space.sync:select{}
+
 test_run:switch('default')
-box.space.sync:select{} -- 1
+box.space.sync:select{}
+
 -- Testcase cleanup.
 test_run:switch('default')
 box.space.sync:drop()
diff --git a/test/unit/snap_quorum_delay.cc b/test/unit/snap_quorum_delay.cc
index 8d50cfb27..ad0563345 100644
--- a/test/unit/snap_quorum_delay.cc
+++ b/test/unit/snap_quorum_delay.cc
@@ -78,7 +78,7 @@ enum process_type {
  * (to push a transaction to the limbo and simulate confirm).
  */
 const int fake_lsn = 1;
-const int instace_id = 1;
+extern "C" int instance_id;
 const int relay_id = 2;
 
 int
@@ -109,7 +109,7 @@ txn_process_func(va_list ap)
 	 * and call txn_commit (or another) later.
 	 */
 	struct txn_limbo_entry *entry = txn_limbo_append(&txn_limbo,
-							 instace_id, txn);
+							 instance_id, txn);
 	/*
 	 * The trigger is used to verify that the transaction has been
 	 * completed.
@@ -130,7 +130,7 @@ txn_process_func(va_list ap)
 		unreachable();
 	}
 
-	txn_limbo_assign_lsn(&txn_limbo, entry, fake_lsn);
+	txn_limbo_assign_local_lsn(&txn_limbo, entry, fake_lsn);
 	txn_limbo_ack(&txn_limbo, txn_limbo.instance_id, fake_lsn);
 	txn_limbo_wait_complete(&txn_limbo, entry);
 
@@ -239,6 +239,7 @@ main(void)
 	fiber_init(fiber_c_invoke);
 	gc_init();
 	txn_limbo_init();
+	instance_id = 1;
 
 	struct fiber *main_fiber = fiber_new("main", test_snap_delay);
 	assert(main_fiber != NULL);

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 00/19] Sync replication
  2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
                     ` (26 preceding siblings ...)
  2020-07-10  0:50   ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
@ 2020-07-10  7:40   ` Kirill Yukhin
  27 siblings, 0 replies; 68+ messages in thread
From: Kirill Yukhin @ 2020-07-10  7:40 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hello,

On 30 июн 01:15, Vladislav Shpilevoy wrote:
> Synchronous replication draft patchset. From the previous version
> changed almost everything, not much sense to describe it all here.
> 
> Branch: http://github.com/tarantool/tarantool/tree/gh-4842-sync-replication
> Issue: https://github.com/tarantool/tarantool/issues/4842
> 
> Leonid Vasiliev (1):
>   replication: add support of qsync to the snapshot machinery
> 
> Serge Petrenko (11):
>   xrow: introduce CONFIRM and ROLLBACK entries
>   txn: introduce various reasons for txn rollback
>   replication: write and read CONFIRM entries
>   txn_limbo: add timeout when waiting for acks.
>   txn_limbo: add ROLLBACK processing
>   box: rework local_recovery to use async txn_commit
>   replication: support ROLLBACK and CONFIRM during recovery
>   replication: add test for synchro CONFIRM/ROLLBACK
>   txn_limbo: add diag_set in txn_limbo_wait_confirm
>   replication: delay initial join until confirmation
>   replication: only send confirmed data during final join
> 
> Vladislav Shpilevoy (7):
>   replication: introduce space.is_sync option
>   replication: introduce replication_synchro_* cfg options
>   txn: add TXN_WAIT_ACK flag
>   replication: make sync transactions wait quorum
>   applier: remove writer_cond
>   applier: send heartbeat not only on commit, but on any write
>   replication: block async transactions when not empty limbo

I've checked the patchset into master.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2020-07-10  7:40 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1593723973.git.sergeyb@tarantool.org>
2020-06-29 23:15 ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 01/19] replication: introduce space.is_sync option Vladislav Shpilevoy
2020-06-30 23:00     ` Vladislav Shpilevoy
2020-07-01 15:55       ` Sergey Ostanevich
2020-07-01 23:46         ` Vladislav Shpilevoy
2020-07-02  8:25       ` Serge Petrenko
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 10/19] txn_limbo: add ROLLBACK processing Vladislav Shpilevoy
2020-07-05 15:29     ` Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 11/19] box: rework local_recovery to use async txn_commit Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 12/19] replication: support ROLLBACK and CONFIRM during recovery Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 13/19] replication: add test for synchro CONFIRM/ROLLBACK Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 14/19] applier: remove writer_cond Vladislav Shpilevoy
2020-07-02  9:13     ` Serge Petrenko
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 15/19] applier: send heartbeat not only on commit, but on any write Vladislav Shpilevoy
2020-07-01 23:55     ` Vladislav Shpilevoy
2020-07-03 12:23     ` Serge Petrenko
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 16/19] txn_limbo: add diag_set in txn_limbo_wait_confirm Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 17/19] replication: delay initial join until confirmation Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 18/19] replication: only send confirmed data during final join Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 19/19] replication: block async transactions when not empty limbo Vladislav Shpilevoy
2020-07-01 17:12     ` Sergey Ostanevich
2020-07-01 23:47       ` Vladislav Shpilevoy
2020-07-03 12:28     ` Serge Petrenko
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 02/19] replication: introduce replication_synchro_* cfg options Vladislav Shpilevoy
2020-07-01 16:05     ` Sergey Ostanevich
2020-07-01 23:46       ` Vladislav Shpilevoy
2020-07-02  8:29     ` Serge Petrenko
2020-07-02 23:36       ` Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 03/19] txn: add TXN_WAIT_ACK flag Vladislav Shpilevoy
2020-07-01 17:14     ` Sergey Ostanevich
2020-07-01 23:46     ` Vladislav Shpilevoy
2020-07-02  8:30     ` Serge Petrenko
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 04/19] replication: make sync transactions wait quorum Vladislav Shpilevoy
2020-06-30 23:00     ` Vladislav Shpilevoy
2020-07-02  8:48     ` Serge Petrenko
2020-07-03 21:16       ` Vladislav Shpilevoy
2020-07-05 16:05     ` Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 05/19] xrow: introduce CONFIRM and ROLLBACK entries Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 06/19] txn: introduce various reasons for txn rollback Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 07/19] replication: write and read CONFIRM entries Vladislav Shpilevoy
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 08/19] replication: add support of qsync to the snapshot machinery Vladislav Shpilevoy
2020-07-02  8:52     ` Serge Petrenko
2020-07-08 11:43     ` Leonid Vasiliev
2020-06-29 23:15   ` [Tarantool-patches] [PATCH v2 09/19] txn_limbo: add timeout when waiting for acks Vladislav Shpilevoy
2020-06-29 23:22   ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
2020-06-30 23:00   ` [Tarantool-patches] [PATCH v2 20/19] replication: add test for quorum 1 Vladislav Shpilevoy
2020-07-03 12:32     ` Serge Petrenko
2020-07-02 21:13   ` [Tarantool-patches] [PATCH 1/4] replication: regression test on gh-5119 [not fixed] sergeyb
2020-07-02 21:13   ` [Tarantool-patches] [PATCH 2/4] replication: add advanced tests for sync replication sergeyb
2020-07-02 22:46     ` Sergey Bronnikov
2020-07-02 23:20     ` Vladislav Shpilevoy
2020-07-06 12:30       ` Sergey Bronnikov
2020-07-06 23:31     ` Vladislav Shpilevoy
2020-07-07 12:12       ` Sergey Bronnikov
2020-07-07 20:57         ` Vladislav Shpilevoy
2020-07-08 12:07           ` Sergey Bronnikov
2020-07-08 22:13             ` Vladislav Shpilevoy
2020-07-09  9:39               ` Sergey Bronnikov
2020-07-02 21:13   ` [Tarantool-patches] [PATCH 3/4] replication: add tests for sync replication with anon replica sergeyb
2020-07-06 23:31     ` Vladislav Shpilevoy
2020-07-02 21:13   ` [Tarantool-patches] [PATCH 4/4] replication: add tests for sync replication with snapshots sergeyb
2020-07-02 22:46     ` Sergey Bronnikov
2020-07-02 23:20     ` Vladislav Shpilevoy
2020-07-06 23:31     ` Vladislav Shpilevoy
2020-07-07 16:00       ` Sergey Bronnikov
2020-07-06 23:31   ` [Tarantool-patches] [PATCH] Add new error injection constant ERRINJ_SYNC_TIMEOUT Vladislav Shpilevoy
2020-07-10  0:50   ` [Tarantool-patches] [PATCH v2 00/19] Sync replication Vladislav Shpilevoy
2020-07-10  7:40   ` Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox