Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH 0/4] Abort vinyl transactions before switching to ro
@ 2019-03-04 15:39 Vladimir Davydov
  2019-03-04 15:39 ` [PATCH 1/4] vinyl: rename tx statement begin/rollback routines Vladimir Davydov
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Vladimir Davydov @ 2019-03-04 15:39 UTC (permalink / raw)
  To: kostja; +Cc: georgy, tarantool-patches

In contrast to memtx, vinyl transactions may yield, which opens a time
window for the instance to switch to read-only mode. Since we check ro
flag only before executing a dml request, nothing prevents a transaction
from committing in such a case. This breaks master-replica switching
logic implemented by one of our customers, which justifiably assumes
that no local writes are allowed on an instance once box.cfg{read_only}
returns.

Fix this by forcefully aborting all local rw transactions in vinyl
before switching to read-only mode.

https://github.com/tarantool/tarantool/issues/4016
https://github.com/tarantool/tarantool/commits/dv/gh-4016-vy-abort-writers-for-ro

This patch set is an alternative approach to the fix proposed by Georgy,
for more details see discussion started at:

https://www.freelists.org/post/tarantool-patches/PATCH-Do-not-enable-commit-if-read-only-true

Vladimir Davydov (4):
  vinyl: rename tx statement begin/rollback routines
  vinyl: add tx to writers list in begin_statement engine callback
  engine: add switch_to_ro callback
  vinyl: abort rw transactions when instance switches to ro

 src/box/blackhole.c        |  1 +
 src/box/box.cc             |  7 ++++
 src/box/engine.c           | 14 ++++++++
 src/box/engine.h           | 13 ++++++++
 src/box/memtx_engine.c     |  1 +
 src/box/sysview.c          |  1 +
 src/box/vinyl.c            | 18 +++++++----
 src/box/vy_tx.c            | 34 +++++++++++++++++---
 src/box/vy_tx.h            | 35 +++++++++++++-------
 test/vinyl/errinj.result   | 76 +++++++++++++++++++++++++++++++++++++++++++
 test/vinyl/errinj.test.lua | 30 +++++++++++++++++
 test/vinyl/misc.result     | 80 ++++++++++++++++++++++++++++++++++++++++++++++
 test/vinyl/misc.test.lua   | 39 ++++++++++++++++++++++
 13 files changed, 327 insertions(+), 22 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] vinyl: rename tx statement begin/rollback routines
  2019-03-04 15:39 [PATCH 0/4] Abort vinyl transactions before switching to ro Vladimir Davydov
@ 2019-03-04 15:39 ` Vladimir Davydov
  2019-03-05  7:29   ` Konstantin Osipov
  2019-03-04 15:39 ` [PATCH 2/4] vinyl: add tx to writers list in begin_statement engine callback Vladimir Davydov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Vladimir Davydov @ 2019-03-04 15:39 UTC (permalink / raw)
  To: kostja; +Cc: georgy, tarantool-patches

Rename vy_tx_rollback_to_savepoint to vy_tx_rollback_statement and
vy_tx_savepoint to vy_tx_begin_statement, because soon we will do some
extra work there.

Needed for #4016
---
 src/box/vinyl.c |  4 ++--
 src/box/vy_tx.c |  9 ++++++++-
 src/box/vy_tx.h | 21 ++++++++++++---------
 3 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 478dd815..87b9eefa 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -2409,7 +2409,7 @@ vinyl_engine_begin_statement(struct engine *engine, struct txn *txn)
 	struct vy_tx *tx = txn->engine_tx;
 	struct txn_stmt *stmt = txn_current_stmt(txn);
 	assert(tx != NULL);
-	stmt->engine_savepoint = vy_tx_savepoint(tx);
+	stmt->engine_savepoint = vy_tx_begin_statement(tx);
 	return 0;
 }
 
@@ -2420,7 +2420,7 @@ vinyl_engine_rollback_statement(struct engine *engine, struct txn *txn,
 	(void)engine;
 	struct vy_tx *tx = txn->engine_tx;
 	assert(tx != NULL);
-	vy_tx_rollback_to_savepoint(tx, stmt->engine_savepoint);
+	vy_tx_rollback_statement(tx, stmt->engine_savepoint);
 }
 
 /* }}} Public API of transaction control */
diff --git a/src/box/vy_tx.c b/src/box/vy_tx.c
index ac02ee4d..53c495d4 100644
--- a/src/box/vy_tx.c
+++ b/src/box/vy_tx.c
@@ -833,8 +833,15 @@ vy_tx_rollback(struct vy_tx *tx)
 	mempool_free(&xm->tx_mempool, tx);
 }
 
+void *
+vy_tx_begin_statement(struct vy_tx *tx)
+{
+	assert(tx->state == VINYL_TX_READY);
+	return stailq_last(&tx->log);
+}
+
 void
-vy_tx_rollback_to_savepoint(struct vy_tx *tx, void *svp)
+vy_tx_rollback_statement(struct vy_tx *tx, void *svp)
 {
 	assert(tx->state == VINYL_TX_READY);
 	struct stailq_entry *last = svp;
diff --git a/src/box/vy_tx.h b/src/box/vy_tx.h
index 9524936f..590538d8 100644
--- a/src/box/vy_tx.h
+++ b/src/box/vy_tx.h
@@ -311,20 +311,23 @@ void
 vy_tx_rollback(struct vy_tx *tx);
 
 /**
+ * Begin a statement in the vinyl transaction manager.
  * Return the save point corresponding to the current
  * transaction state. The transaction can be rolled back
- * to a save point with vy_tx_rollback_to_savepoint().
+ * to a save point with vy_tx_rollback_statement().
  */
-static inline void *
-vy_tx_savepoint(struct vy_tx *tx)
-{
-	assert(tx->state == VINYL_TX_READY);
-	return stailq_last(&tx->log);
-}
+void *
+vy_tx_begin_statement(struct vy_tx *tx);
 
-/** Rollback a transaction to a given save point. */
+/**
+ * Rollback a transaction statement.
+ *
+ * @param tx   Transaction in question.
+ * @param svp  Save point to rollback to, as returned by
+ *             vy_tx_begin_statement().
+ */
 void
-vy_tx_rollback_to_savepoint(struct vy_tx *tx, void *svp);
+vy_tx_rollback_statement(struct vy_tx *tx, void *svp);
 
 /**
  * Remember a read interval in the conflict manager index.
-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/4] vinyl: add tx to writers list in begin_statement engine callback
  2019-03-04 15:39 [PATCH 0/4] Abort vinyl transactions before switching to ro Vladimir Davydov
  2019-03-04 15:39 ` [PATCH 1/4] vinyl: rename tx statement begin/rollback routines Vladimir Davydov
@ 2019-03-04 15:39 ` Vladimir Davydov
  2019-03-05  7:30   ` Konstantin Osipov
  2019-03-04 15:39 ` [PATCH 3/4] engine: add switch_to_ro callback Vladimir Davydov
  2019-03-04 15:39 ` [PATCH 4/4] vinyl: abort rw transactions when instance switches to ro Vladimir Davydov
  3 siblings, 1 reply; 10+ messages in thread
From: Vladimir Davydov @ 2019-03-04 15:39 UTC (permalink / raw)
  To: kostja; +Cc: georgy, tarantool-patches

Currently, we add a transaction to the list of writers when executing a
DML request, i.e. in vy_tx_set. The problem is a transaction can yield
on read before calling vy_tx_set, e.g. to check a uniqueness constraint,
which opens a time window when a transaction is not yet on the list, but
it will surely proceed to DML after it continues execution. If we need
to abort writers in this time window, we'll miss it. To prevent this,
let's add a transaction to the list of writers in vy_tx_begin_statement.

Note, after this patch, when a transaction is aborted for DDL, it may
have an empty write set - it happens if tx_manager_abort_writers is
called between vy_tx_begin_statement and vy_tx_set. Hence we have to
remove the corresponding assertion from tx_manager_abort_writers.

Needed for #4016
---
 src/box/vy_tx.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/box/vy_tx.c b/src/box/vy_tx.c
index 53c495d4..d216c73d 100644
--- a/src/box/vy_tx.c
+++ b/src/box/vy_tx.c
@@ -837,6 +837,8 @@ void *
 vy_tx_begin_statement(struct vy_tx *tx)
 {
 	assert(tx->state == VINYL_TX_READY);
+	if (stailq_empty(&tx->log))
+		rlist_add_entry(&tx->xm->writers, tx, in_writers);
 	return stailq_last(&tx->log);
 }
 
@@ -1057,8 +1059,6 @@ vy_tx_set_with_colmask(struct vy_tx *tx, struct vy_lsm *lsm,
 	tx->write_set_version++;
 	tx->write_size += tuple_size(stmt);
 	vy_stmt_counter_acct_tuple(&lsm->stat.txw.count, stmt);
-	if (stailq_empty(&tx->log))
-		rlist_add_entry(&tx->xm->writers, tx, in_writers);
 	stailq_add_tail_entry(&tx->log, v, next_in_log);
 	return 0;
 }
@@ -1068,7 +1068,6 @@ tx_manager_abort_writers(struct tx_manager *xm, struct vy_lsm *lsm)
 {
 	struct vy_tx *tx;
 	rlist_foreach_entry(tx, &xm->writers, in_writers) {
-		assert(!vy_tx_is_ro(tx));
 		if (tx->state == VINYL_TX_READY &&
 		    write_set_search_key(&tx->write_set, lsm,
 					 lsm->env->empty_key) != NULL)
-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/4] engine: add switch_to_ro callback
  2019-03-04 15:39 [PATCH 0/4] Abort vinyl transactions before switching to ro Vladimir Davydov
  2019-03-04 15:39 ` [PATCH 1/4] vinyl: rename tx statement begin/rollback routines Vladimir Davydov
  2019-03-04 15:39 ` [PATCH 2/4] vinyl: add tx to writers list in begin_statement engine callback Vladimir Davydov
@ 2019-03-04 15:39 ` Vladimir Davydov
  2019-03-05  7:31   ` Konstantin Osipov
  2019-03-04 15:39 ` [PATCH 4/4] vinyl: abort rw transactions when instance switches to ro Vladimir Davydov
  3 siblings, 1 reply; 10+ messages in thread
From: Vladimir Davydov @ 2019-03-04 15:39 UTC (permalink / raw)
  To: kostja; +Cc: georgy, tarantool-patches

We will use this callback to abort rw transactions in Vinyl when an
instance is switch to read-only mode.

Needed for #4016
---
 src/box/blackhole.c    |  1 +
 src/box/box.cc         |  7 +++++++
 src/box/engine.c       | 14 ++++++++++++++
 src/box/engine.h       | 13 +++++++++++++
 src/box/memtx_engine.c |  1 +
 src/box/sysview.c      |  1 +
 src/box/vinyl.c        |  1 +
 7 files changed, 38 insertions(+)

diff --git a/src/box/blackhole.c b/src/box/blackhole.c
index 249eb678..95064e1f 100644
--- a/src/box/blackhole.c
+++ b/src/box/blackhole.c
@@ -184,6 +184,7 @@ static const struct engine_vtab blackhole_engine_vtab = {
 	/* .commit = */ generic_engine_commit,
 	/* .rollback_statement = */ generic_engine_rollback_statement,
 	/* .rollback = */ generic_engine_rollback,
+	/* .switch_to_ro = */ generic_engine_switch_to_ro,
 	/* .bootstrap = */ generic_engine_bootstrap,
 	/* .begin_initial_recovery = */ generic_engine_begin_initial_recovery,
 	/* .begin_final_recovery = */ generic_engine_begin_final_recovery,
diff --git a/src/box/box.cc b/src/box/box.cc
index a6259542..fb896852 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -222,6 +222,11 @@ process_nop(struct request *request)
 void
 box_set_ro(bool ro)
 {
+	if (ro == is_ro)
+		return; /* nothing to do */
+	if (ro)
+		engine_switch_to_ro();
+
 	is_ro = ro;
 	fiber_cond_broadcast(&ro_cond);
 }
@@ -252,6 +257,8 @@ box_set_orphan(bool orphan)
 {
 	if (is_orphan == orphan)
 		return; /* nothing to do */
+	if (orphan)
+		engine_switch_to_ro();
 
 	is_orphan = orphan;
 	fiber_cond_broadcast(&ro_cond);
diff --git a/src/box/engine.c b/src/box/engine.c
index d0a08f5c..a52d0ed1 100644
--- a/src/box/engine.c
+++ b/src/box/engine.c
@@ -65,6 +65,14 @@ engine_shutdown(void)
 	}
 }
 
+void
+engine_switch_to_ro(void)
+{
+	struct engine *engine;
+	engine_foreach(engine)
+		engine->vtab->switch_to_ro(engine);
+}
+
 int
 engine_bootstrap(void)
 {
@@ -253,6 +261,12 @@ generic_engine_rollback(struct engine *engine, struct txn *txn)
 	(void)txn;
 }
 
+void
+generic_engine_switch_to_ro(struct engine *engine)
+{
+	(void)engine;
+}
+
 int
 generic_engine_bootstrap(struct engine *engine)
 {
diff --git a/src/box/engine.h b/src/box/engine.h
index a6949d28..a302b3bc 100644
--- a/src/box/engine.h
+++ b/src/box/engine.h
@@ -113,6 +113,12 @@ struct engine_vtab {
 	 */
 	void (*rollback)(struct engine *, struct txn *);
 	/**
+	 * Notify the engine that the instance is about to switch
+	 * to read-only mode. The engine is supposed to abort all
+	 * active rw transactions when this method is called.
+	 */
+	void (*switch_to_ro)(struct engine *);
+	/**
 	 * Bootstrap an empty data directory
 	 */
 	int (*bootstrap)(struct engine *);
@@ -293,6 +299,12 @@ void
 engine_shutdown(void);
 
 /**
+ * Called before switching the instance to read-only mode.
+ */
+void
+engine_switch_to_ro(void);
+
+/**
  * Initialize an empty data directory
  */
 int
@@ -361,6 +373,7 @@ void generic_engine_commit(struct engine *, struct txn *);
 void generic_engine_rollback_statement(struct engine *, struct txn *,
 				       struct txn_stmt *);
 void generic_engine_rollback(struct engine *, struct txn *);
+void generic_engine_switch_to_ro(struct engine *);
 int generic_engine_bootstrap(struct engine *);
 int generic_engine_begin_initial_recovery(struct engine *,
 					  const struct vclock *);
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 049a7f7d..d468d1cd 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -936,6 +936,7 @@ static const struct engine_vtab memtx_engine_vtab = {
 	/* .commit = */ generic_engine_commit,
 	/* .rollback_statement = */ memtx_engine_rollback_statement,
 	/* .rollback = */ memtx_engine_rollback,
+	/* .switch_to_ro = */ generic_engine_switch_to_ro,
 	/* .bootstrap = */ memtx_engine_bootstrap,
 	/* .begin_initial_recovery = */ memtx_engine_begin_initial_recovery,
 	/* .begin_final_recovery = */ memtx_engine_begin_final_recovery,
diff --git a/src/box/sysview.c b/src/box/sysview.c
index 29de4309..63b669d0 100644
--- a/src/box/sysview.c
+++ b/src/box/sysview.c
@@ -534,6 +534,7 @@ static const struct engine_vtab sysview_engine_vtab = {
 	/* .commit = */ generic_engine_commit,
 	/* .rollback_statement = */ generic_engine_rollback_statement,
 	/* .rollback = */ generic_engine_rollback,
+	/* .switch_to_ro = */ generic_engine_switch_to_ro,
 	/* .bootstrap = */ generic_engine_bootstrap,
 	/* .begin_initial_recovery = */ generic_engine_begin_initial_recovery,
 	/* .begin_final_recovery = */ generic_engine_begin_final_recovery,
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 87b9eefa..c8b02eb8 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -4463,6 +4463,7 @@ static const struct engine_vtab vinyl_engine_vtab = {
 	/* .commit = */ vinyl_engine_commit,
 	/* .rollback_statement = */ vinyl_engine_rollback_statement,
 	/* .rollback = */ vinyl_engine_rollback,
+	/* .switch_to_ro = */ generic_engine_switch_to_ro,
 	/* .bootstrap = */ vinyl_engine_bootstrap,
 	/* .begin_initial_recovery = */ vinyl_engine_begin_initial_recovery,
 	/* .begin_final_recovery = */ vinyl_engine_begin_final_recovery,
-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/4] vinyl: abort rw transactions when instance switches to ro
  2019-03-04 15:39 [PATCH 0/4] Abort vinyl transactions before switching to ro Vladimir Davydov
                   ` (2 preceding siblings ...)
  2019-03-04 15:39 ` [PATCH 3/4] engine: add switch_to_ro callback Vladimir Davydov
@ 2019-03-04 15:39 ` Vladimir Davydov
  2019-03-05  7:43   ` Konstantin Osipov
  3 siblings, 1 reply; 10+ messages in thread
From: Vladimir Davydov @ 2019-03-04 15:39 UTC (permalink / raw)
  To: kostja; +Cc: georgy, tarantool-patches

A Vinyl transaction may yield while having a non-empty write set. This
opens a time window for the instance to switch to read-only mode. Since
we check ro flag only before executing a DML request, the transaction
would successfully commit in such a case, breaking the assumption that
no writes are possible on an instance after box.cfg{read_only=true}
returns. In particular, this breaks master-replica switching logic.

Fix this by aborting all local rw transactions before switching to
read-only mode. Note, remote rw transactions must not be aborted,
because they ignore ro flag.

Closes #4016
---
 src/box/vinyl.c            | 15 ++++++---
 src/box/vy_tx.c            | 20 +++++++++++-
 src/box/vy_tx.h            | 14 ++++++--
 test/vinyl/errinj.result   | 76 +++++++++++++++++++++++++++++++++++++++++++
 test/vinyl/errinj.test.lua | 30 +++++++++++++++++
 test/vinyl/misc.result     | 80 ++++++++++++++++++++++++++++++++++++++++++++++
 test/vinyl/misc.test.lua   | 39 ++++++++++++++++++++++
 7 files changed, 266 insertions(+), 8 deletions(-)

diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index c8b02eb8..aac890b6 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -70,7 +70,6 @@
 #include "info/info.h"
 #include "column_mask.h"
 #include "trigger.h"
-#include "session.h"
 #include "wal.h" /* wal_mode() */
 
 /**
@@ -1041,7 +1040,7 @@ vinyl_space_prepare_alter(struct space *old_space, struct space *new_space)
 static void
 vy_abort_writers_for_ddl(struct vy_env *env, struct vy_lsm *lsm)
 {
-	tx_manager_abort_writers(env->xm, lsm);
+	tx_manager_abort_writers_for_ddl(env->xm, lsm);
 	/*
 	 * Wait for prepared transactions to complete
 	 * (we can't abort them as they reached WAL).
@@ -2334,8 +2333,7 @@ vinyl_engine_prepare(struct engine *engine, struct txn *txn)
 	 * available for the admin to track the lag so let the applier
 	 * wait as long as necessary for memory dump to complete.
 	 */
-	double timeout = (current_session()->type != SESSION_TYPE_APPLIER ?
-			  env->timeout : TIMEOUT_INFINITY);
+	double timeout = (tx->is_remote ? TIMEOUT_INFINITY : env->timeout);
 	/*
 	 * Reserve quota needed by the transaction before allocating
 	 * memory. Since this may yield, which opens a time window for
@@ -2423,6 +2421,13 @@ vinyl_engine_rollback_statement(struct engine *engine, struct txn *txn,
 	vy_tx_rollback_statement(tx, stmt->engine_savepoint);
 }
 
+static void
+vinyl_engine_switch_to_ro(struct engine *engine)
+{
+	struct vy_env *env = vy_env(engine);
+	tx_manager_abort_writers_for_ro(env->xm);
+}
+
 /* }}} Public API of transaction control */
 
 /** {{{ Environment */
@@ -4463,7 +4468,7 @@ static const struct engine_vtab vinyl_engine_vtab = {
 	/* .commit = */ vinyl_engine_commit,
 	/* .rollback_statement = */ vinyl_engine_rollback_statement,
 	/* .rollback = */ vinyl_engine_rollback,
-	/* .switch_to_ro = */ generic_engine_switch_to_ro,
+	/* .switch_to_ro = */ vinyl_engine_switch_to_ro,
 	/* .bootstrap = */ vinyl_engine_bootstrap,
 	/* .begin_initial_recovery = */ vinyl_engine_begin_initial_recovery,
 	/* .begin_final_recovery = */ vinyl_engine_begin_final_recovery,
diff --git a/src/box/vy_tx.c b/src/box/vy_tx.c
index d216c73d..275fcb62 100644
--- a/src/box/vy_tx.c
+++ b/src/box/vy_tx.c
@@ -46,6 +46,7 @@
 #include "iterator_type.h"
 #include "salad/stailq.h"
 #include "schema.h" /* space_cache_version */
+#include "session.h"
 #include "space.h"
 #include "trigger.h"
 #include "trivia/util.h"
@@ -309,6 +310,7 @@ vy_tx_create(struct tx_manager *xm, struct vy_tx *tx)
 	tx->write_size = 0;
 	tx->xm = xm;
 	tx->state = VINYL_TX_READY;
+	tx->is_remote = false;
 	tx->read_view = (struct vy_read_view *)xm->p_global_read_view;
 	vy_tx_read_set_new(&tx->read_set);
 	tx->psn = 0;
@@ -407,6 +409,11 @@ vy_tx_begin(struct tx_manager *xm)
 		return NULL;
 	}
 	vy_tx_create(xm, tx);
+
+	struct session *session = fiber_get_session(fiber());
+	if (session != NULL && session->type == SESSION_TYPE_APPLIER)
+		tx->is_remote = true;
+
 	return tx;
 }
 
@@ -1064,7 +1071,7 @@ vy_tx_set_with_colmask(struct vy_tx *tx, struct vy_lsm *lsm,
 }
 
 void
-tx_manager_abort_writers(struct tx_manager *xm, struct vy_lsm *lsm)
+tx_manager_abort_writers_for_ddl(struct tx_manager *xm, struct vy_lsm *lsm)
 {
 	struct vy_tx *tx;
 	rlist_foreach_entry(tx, &xm->writers, in_writers) {
@@ -1076,6 +1083,17 @@ tx_manager_abort_writers(struct tx_manager *xm, struct vy_lsm *lsm)
 }
 
 void
+tx_manager_abort_writers_for_ro(struct tx_manager *xm)
+{
+	struct vy_tx *tx;
+	rlist_foreach_entry(tx, &xm->writers, in_writers) {
+		/* Remote transactions ignore ro flag. */
+		if (tx->state == VINYL_TX_READY && !tx->is_remote)
+			tx->state = VINYL_TX_ABORT;
+	}
+}
+
+void
 vy_txw_iterator_open(struct vy_txw_iterator *itr,
 		     struct vy_txw_iterator_stat *stat,
 		     struct vy_tx *tx, struct vy_lsm *lsm,
diff --git a/src/box/vy_tx.h b/src/box/vy_tx.h
index 590538d8..e3df600a 100644
--- a/src/box/vy_tx.h
+++ b/src/box/vy_tx.h
@@ -161,6 +161,8 @@ struct vy_tx {
 	size_t write_size;
 	/** Current state of the transaction.*/
 	enum tx_state state;
+	/** Set if the transaction was started by an applier. */
+	bool is_remote;
 	/**
 	 * The read view of this transaction. When a transaction
 	 * is started, it is set to the "read committed" state,
@@ -276,10 +278,18 @@ tx_manager_mem_used(struct tx_manager *xm);
 
 /**
  * Abort all rw transactions that affect the given LSM tree
- * and haven't reached WAL yet.
+ * and haven't reached WAL yet. Called before executing a DDL
+ * operation.
  */
 void
-tx_manager_abort_writers(struct tx_manager *xm, struct vy_lsm *lsm);
+tx_manager_abort_writers_for_ddl(struct tx_manager *xm, struct vy_lsm *lsm);
+
+/**
+ * Abort all local rw transactions that haven't reached WAL yet.
+ * Called before switching to read-only mode.
+ */
+void
+tx_manager_abort_writers_for_ro(struct tx_manager *xm);
 
 /** Initialize a tx object. */
 void
diff --git a/test/vinyl/errinj.result b/test/vinyl/errinj.result
index 248b32c8..9b5f7314 100644
--- a/test/vinyl/errinj.result
+++ b/test/vinyl/errinj.result
@@ -1166,3 +1166,79 @@ test_run:cmd("restart server default")
 box.space.test:drop()
 ---
 ...
+--
+-- Check that remote transactions are not aborted when an instance
+-- switches to read-only mode (gh-4016).
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+---
+...
+_ = s:create_index('pk')
+---
+...
+s:replace{1, 1}
+---
+- [1, 1]
+...
+box.schema.user.grant('guest', 'replication')
+---
+...
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+---
+- true
+...
+test_run:cmd("start server replica")
+---
+- true
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.error.injection.set('ERRINJ_VY_READ_PAGE_TIMEOUT', 0.1)
+---
+- ok
+...
+test_run:cmd("switch default")
+---
+- true
+...
+s:update({1}, {{'+', 2, 1}})
+---
+- [1, 2]
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.cfg{read_only = true}
+---
+...
+test_run:cmd("switch default")
+---
+- true
+...
+vclock = test_run:get_vclock("default")
+---
+...
+_ = test_run:wait_vclock("replica", vclock)
+---
+...
+test_run:cmd("stop server replica")
+---
+- true
+...
+test_run:cmd("cleanup server replica")
+---
+- true
+...
+test_run:cmd("delete server replica")
+---
+- true
+...
+box.schema.user.revoke('guest', 'replication')
+---
+...
+s:drop()
+---
+...
diff --git a/test/vinyl/errinj.test.lua b/test/vinyl/errinj.test.lua
index eaec52a5..5c7a525f 100644
--- a/test/vinyl/errinj.test.lua
+++ b/test/vinyl/errinj.test.lua
@@ -428,3 +428,33 @@ _ = fiber.create(function() box.snapshot() end)
 
 test_run:cmd("restart server default")
 box.space.test:drop()
+
+--
+-- Check that remote transactions are not aborted when an instance
+-- switches to read-only mode (gh-4016).
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+_ = s:create_index('pk')
+s:replace{1, 1}
+box.schema.user.grant('guest', 'replication')
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+test_run:cmd("start server replica")
+
+test_run:cmd("switch replica")
+box.error.injection.set('ERRINJ_VY_READ_PAGE_TIMEOUT', 0.1)
+
+test_run:cmd("switch default")
+s:update({1}, {{'+', 2, 1}})
+
+test_run:cmd("switch replica")
+box.cfg{read_only = true}
+
+test_run:cmd("switch default")
+vclock = test_run:get_vclock("default")
+_ = test_run:wait_vclock("replica", vclock)
+
+test_run:cmd("stop server replica")
+test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+box.schema.user.revoke('guest', 'replication')
+s:drop()
diff --git a/test/vinyl/misc.result b/test/vinyl/misc.result
index 5f67271e..685fd776 100644
--- a/test/vinyl/misc.result
+++ b/test/vinyl/misc.result
@@ -282,3 +282,83 @@ test_run:cmd('cleanup server test')
 ---
 - true
 ...
+--
+-- gh-4016: local rw transactions are aborted when the instance
+-- switches to read-only mode.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+---
+...
+_ = s:create_index('pk')
+---
+...
+s:replace({1, 1})
+---
+- [1, 1]
+...
+test_run:cmd("setopt delimiter ';'")
+---
+- true
+...
+-- Start rw transaction.
+ch1 = fiber.channel(1);
+---
+...
+_ = fiber.create(function()
+    box.begin()
+    s:replace{1, 2}
+    ch1:get()
+    local status, err = pcall(box.commit)
+    ch1:put(status or err)
+end);
+---
+...
+-- Start ro transaction.
+ch2 = fiber.channel(1);
+---
+...
+_ = fiber.create(function()
+    box.begin()
+    s:select()
+    ch2:get()
+    local status, err = pcall(box.commit)
+    ch2:put(status or err)
+end);
+---
+...
+test_run:cmd("setopt delimiter ''");
+---
+- true
+...
+-- Switch to ro mode.
+box.cfg{read_only = true}
+---
+...
+-- Resume the transactions.
+ch1:put(true)
+---
+- true
+...
+ch2:put(true)
+---
+- true
+...
+ch1:get()
+---
+- Transaction has been aborted by conflict
+...
+ch2:get()
+---
+- true
+...
+-- Cleanup.
+box.cfg{read_only = false}
+---
+...
+s:select()
+---
+- - [1, 1]
+...
+s:drop()
+---
+...
diff --git a/test/vinyl/misc.test.lua b/test/vinyl/misc.test.lua
index 1c3a9517..cdc22774 100644
--- a/test/vinyl/misc.test.lua
+++ b/test/vinyl/misc.test.lua
@@ -114,3 +114,42 @@ s:drop()
 test_run:cmd('switch default')
 test_run:cmd('stop server test')
 test_run:cmd('cleanup server test')
+
+--
+-- gh-4016: local rw transactions are aborted when the instance
+-- switches to read-only mode.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+_ = s:create_index('pk')
+s:replace({1, 1})
+test_run:cmd("setopt delimiter ';'")
+-- Start rw transaction.
+ch1 = fiber.channel(1);
+_ = fiber.create(function()
+    box.begin()
+    s:replace{1, 2}
+    ch1:get()
+    local status, err = pcall(box.commit)
+    ch1:put(status or err)
+end);
+-- Start ro transaction.
+ch2 = fiber.channel(1);
+_ = fiber.create(function()
+    box.begin()
+    s:select()
+    ch2:get()
+    local status, err = pcall(box.commit)
+    ch2:put(status or err)
+end);
+test_run:cmd("setopt delimiter ''");
+-- Switch to ro mode.
+box.cfg{read_only = true}
+-- Resume the transactions.
+ch1:put(true)
+ch2:put(true)
+ch1:get()
+ch2:get()
+-- Cleanup.
+box.cfg{read_only = false}
+s:select()
+s:drop()
-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] vinyl: rename tx statement begin/rollback routines
  2019-03-04 15:39 ` [PATCH 1/4] vinyl: rename tx statement begin/rollback routines Vladimir Davydov
@ 2019-03-05  7:29   ` Konstantin Osipov
  0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Osipov @ 2019-03-05  7:29 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: georgy, tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/03/05 10:25]:
> Rename vy_tx_rollback_to_savepoint to vy_tx_rollback_statement and
> vy_tx_savepoint to vy_tx_begin_statement, because soon we will do some
> extra work there.
> 
> Needed for #4016

OK to push.


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] vinyl: add tx to writers list in begin_statement engine callback
  2019-03-04 15:39 ` [PATCH 2/4] vinyl: add tx to writers list in begin_statement engine callback Vladimir Davydov
@ 2019-03-05  7:30   ` Konstantin Osipov
  0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Osipov @ 2019-03-05  7:30 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: georgy, tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/03/05 10:25]:
> Currently, we add a transaction to the list of writers when executing a
> DML request, i.e. in vy_tx_set. The problem is a transaction can yield
> on read before calling vy_tx_set, e.g. to check a uniqueness constraint,
> which opens a time window when a transaction is not yet on the list, but
> it will surely proceed to DML after it continues execution. If we need
> to abort writers in this time window, we'll miss it. To prevent this,
> let's add a transaction to the list of writers in vy_tx_begin_statement.
> 
> Note, after this patch, when a transaction is aborted for DDL, it may
> have an empty write set - it happens if tx_manager_abort_writers is
> called between vy_tx_begin_statement and vy_tx_set. Hence we have to
> remove the corresponding assertion from tx_manager_abort_writers.
> 
> Needed for #4016

OK to push.


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] engine: add switch_to_ro callback
  2019-03-04 15:39 ` [PATCH 3/4] engine: add switch_to_ro callback Vladimir Davydov
@ 2019-03-05  7:31   ` Konstantin Osipov
  0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Osipov @ 2019-03-05  7:31 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: georgy, tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/03/05 10:25]:
> We will use this callback to abort rw transactions in Vinyl when an
> instance is switch to read-only mode.
> 
> Needed for #4016

OK to push.


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] vinyl: abort rw transactions when instance switches to ro
  2019-03-04 15:39 ` [PATCH 4/4] vinyl: abort rw transactions when instance switches to ro Vladimir Davydov
@ 2019-03-05  7:43   ` Konstantin Osipov
  2019-03-05  8:35     ` Vladimir Davydov
  0 siblings, 1 reply; 10+ messages in thread
From: Konstantin Osipov @ 2019-03-05  7:43 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: georgy, tarantool-patches

* Vladimir Davydov <vdavydov.dev@gmail.com> [19/03/05 10:25]:
> A Vinyl transaction may yield while having a non-empty write set. This
> opens a time window for the instance to switch to read-only mode. Since
> we check ro flag only before executing a DML request, the transaction
> would successfully commit in such a case, breaking the assumption that
> no writes are possible on an instance after box.cfg{read_only=true}
> returns. In particular, this breaks master-replica switching logic.
> 
> Fix this by aborting all local rw transactions before switching to
> read-only mode. Note, remote rw transactions must not be aborted,
> because they ignore ro flag.
> 
> Closes #4016

OK to push, a few comments below.

> -	double timeout = (current_session()->type != SESSION_TYPE_APPLIER ?
> -			  env->timeout : TIMEOUT_INFINITY);
> +	double timeout = (tx->is_remote ? TIMEOUT_INFINITY : env->timeout);

is_remote is a vague name. A net.box connection is remote as well.
Why not simply have tx->session_type or
tx->session_type_is_applier or tx->is_session_type_applier? 

>  void
> +tx_manager_abort_writers_for_ro(struct tx_manager *xm)
> +{
> +	struct vy_tx *tx;
> +	rlist_foreach_entry(tx, &xm->writers, in_writers) {
> +		/* Remote transactions ignore ro flag. */
> +		if (tx->state == VINYL_TX_READY && !tx->is_remote)
> +			tx->state = VINYL_TX_ABORT;
> +	}
> +}

I think we should retry applier transaction in case of
abort conlfict. The error code for transaction conflict is
distinguishable in the diagnostics area. This would help here and
when conflicting with user transactions. Please feel free to do in
a separate patch - or Georgy could do it in scope of his work on
transactional applier.

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] vinyl: abort rw transactions when instance switches to ro
  2019-03-05  7:43   ` Konstantin Osipov
@ 2019-03-05  8:35     ` Vladimir Davydov
  0 siblings, 0 replies; 10+ messages in thread
From: Vladimir Davydov @ 2019-03-05  8:35 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: georgy, tarantool-patches

On Tue, Mar 05, 2019 at 10:43:33AM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [19/03/05 10:25]:
> > A Vinyl transaction may yield while having a non-empty write set. This
> > opens a time window for the instance to switch to read-only mode. Since
> > we check ro flag only before executing a DML request, the transaction
> > would successfully commit in such a case, breaking the assumption that
> > no writes are possible on an instance after box.cfg{read_only=true}
> > returns. In particular, this breaks master-replica switching logic.
> > 
> > Fix this by aborting all local rw transactions before switching to
> > read-only mode. Note, remote rw transactions must not be aborted,
> > because they ignore ro flag.
> > 
> > Closes #4016
> 
> OK to push, a few comments below.
> 
> > -	double timeout = (current_session()->type != SESSION_TYPE_APPLIER ?
> > -			  env->timeout : TIMEOUT_INFINITY);
> > +	double timeout = (tx->is_remote ? TIMEOUT_INFINITY : env->timeout);
> 
> is_remote is a vague name. A net.box connection is remote as well.
> Why not simply have tx->session_type or
> tx->session_type_is_applier or tx->is_session_type_applier? 

Right. Renamed to is_applier_session and pushed all the four patches
to 2.1 and 1.10.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-03-05  8:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-04 15:39 [PATCH 0/4] Abort vinyl transactions before switching to ro Vladimir Davydov
2019-03-04 15:39 ` [PATCH 1/4] vinyl: rename tx statement begin/rollback routines Vladimir Davydov
2019-03-05  7:29   ` Konstantin Osipov
2019-03-04 15:39 ` [PATCH 2/4] vinyl: add tx to writers list in begin_statement engine callback Vladimir Davydov
2019-03-05  7:30   ` Konstantin Osipov
2019-03-04 15:39 ` [PATCH 3/4] engine: add switch_to_ro callback Vladimir Davydov
2019-03-05  7:31   ` Konstantin Osipov
2019-03-04 15:39 ` [PATCH 4/4] vinyl: abort rw transactions when instance switches to ro Vladimir Davydov
2019-03-05  7:43   ` Konstantin Osipov
2019-03-05  8:35     ` Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox