Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/2] Make txn_commit() simpler
@ 2020-07-21 22:42 Vladislav Shpilevoy
  2020-07-21 22:42 ` [Tarantool-patches] [PATCH 1/2] txn_limbo: single function to confirm transactions Vladislav Shpilevoy
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-21 22:42 UTC (permalink / raw)
  To: tarantool-patches, gorcunov

The patch makes so journal_write() always ends with the
transaction completed, with TXN_IS_DONE flag installed. That
allows to remove 1 'if' from hot txn_commit() function, and make
journal API more straightforward.

This change also motivated rework of how limbo commits
transactions.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/txn_commit-opt

Vladislav Shpilevoy (2):
  txn_limbo: single function to confirm transactions
  txn: remove TXN_IS_DONE check from txn_commit()

 src/box/box.cc      | 36 +++++++++++++-----
 src/box/journal.c   | 45 +----------------------
 src/box/journal.h   | 11 ------
 src/box/txn.c       |  6 +--
 src/box/txn_limbo.c | 89 ++++++++++++++++++---------------------------
 src/box/wal.c       |  2 +-
 test/unit/suite.ini |  1 +
 7 files changed, 68 insertions(+), 122 deletions(-)

-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH 1/2] txn_limbo: single function to confirm transactions
  2020-07-21 22:42 [Tarantool-patches] [PATCH 0/2] Make txn_commit() simpler Vladislav Shpilevoy
@ 2020-07-21 22:42 ` Vladislav Shpilevoy
  2020-07-22 15:19   ` Leonid Vasiliev
  2020-07-21 22:42 ` [Tarantool-patches] [PATCH 2/2] txn: remove TXN_IS_DONE check from txn_commit() Vladislav Shpilevoy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-21 22:42 UTC (permalink / raw)
  To: tarantool-patches, gorcunov

Before this patch there were some structural problems with the
limbo code (still are now, but less).

It wasn't clear when lifetime of a transaction ends. When it it is
committed, when rolled back, when completion is called. With
rollback things are more or less bearable, but not with commit.

Transaction's completion could be done at the same time with
setting limbo_entry.is_commit flag. Or with setting the flag +
yield + completion. This led to having a weird crutch in
txn_limbo_wait_complete() to make async transactions explicitly
wait until the previous sync transactions are written to WAL.

The commit flag could be set in 3!!! different places - parameters
update handler, ACK handler, and confirmation handler. In these 3
places there were various assumptions making it really hard to
understand what is happening here and there, what is the
difference. Not counting how much code was duplicated.

This patch makes the commit path always come through confirmation
reader, as the most logical place for that, and already covering
almost all the tricky cases.

Now there is a guarantee that a transaction is completed if it is
removed from the limbo queue. No need to check TXN_IS_DONE, wait
for anything, whatsoever.

Also the patch is a preparation for removal of TXN_IS_DONE check
from the main path of txn_commit(). txn_limbo_wait_complete()
shouldn't ever return a not finished transaction for that.

Part of #5143
---
 src/box/txn_limbo.c | 89 ++++++++++++++++++---------------------------
 1 file changed, 36 insertions(+), 53 deletions(-)

diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index d5b887d36..a3936c569 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -221,6 +221,12 @@ do_rollback:
 
 complete:
 	assert(txn_limbo_entry_is_complete(entry));
+	/*
+	 * Entry is *always* removed from the limbo by the same fiber, which
+	 * installed the commit/rollback flag.
+	 */
+	assert(rlist_empty(&entry->in_queue));
+	assert(txn_has_flag(txn, TXN_IS_DONE));
 	/*
 	 * The first tx to be rolled back already performed all
 	 * the necessary cleanups for us.
@@ -229,28 +235,6 @@ complete:
 		diag_set(ClientError, ER_SYNC_ROLLBACK);
 		return -1;
 	}
-	/*
-	 * The entry might be not the first in the limbo. It
-	 * happens when there was a sync transaction and async
-	 * transaction. The sync and async went to WAL. After sync
-	 * WAL write is done, it may be already ACKed by the
-	 * needed replica count. Now it marks self as committed
-	 * and does the same for the next async txn. Then it
-	 * starts writing CONFIRM. During that the async
-	 * transaction finishes its WAL write, sees it is
-	 * committed and ends up here. Not being the first
-	 * transaction in the limbo.
-	 */
-	while (!rlist_empty(&entry->in_queue) &&
-	       txn_limbo_first_entry(limbo) != entry) {
-		bool cancellable = fiber_set_cancellable(false);
-		fiber_yield();
-		fiber_set_cancellable(cancellable);
-	}
-	if (!rlist_empty(&entry->in_queue))
-		txn_limbo_remove(limbo, entry);
-	txn_clear_flag(txn, TXN_WAIT_SYNC);
-	txn_clear_flag(txn, TXN_WAIT_ACK);
 	return 0;
 }
 
@@ -319,16 +303,22 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
 	struct txn_limbo_entry *e, *tmp;
 	rlist_foreach_entry_safe(e, &limbo->queue, in_queue, tmp) {
 		/*
-		 * Confirm a transaction if
-		 * - it is a sync transaction covered by the
-		 *   confirmation LSN;
-		 * - it is an async transaction, and it is the
-		 *   last in the queue. So it does not depend on
-		 *   a not finished sync transaction anymore and
-		 *   can be confirmed too.
+		 * Check if it is an async transaction last in the queue. When
+		 * it is last, it does not depend on a not finished sync
+		 * transaction anymore and can be confirmed right away.
 		 */
-		if (e->lsn > lsn && txn_has_flag(e->txn, TXN_WAIT_ACK))
-			break;
+		if (txn_has_flag(e->txn, TXN_WAIT_ACK)) {
+			/* Sync transaction not covered by the confirmation. */
+			if (e->lsn > lsn)
+				break;
+			/*
+			 * Sync transaction not yet received an LSN. Happens
+			 * only to local master transactions whose WAL write is
+			 * in progress.
+			 */
+			if (e->lsn == -1)
+				break;
+		}
 		e->is_commit = true;
 		txn_limbo_remove(limbo, e);
 		txn_clear_flag(e->txn, TXN_WAIT_SYNC);
@@ -403,7 +393,7 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 	assert(limbo->instance_id != REPLICA_ID_NIL);
 	int64_t prev_lsn = vclock_get(&limbo->vclock, replica_id);
 	vclock_follow(&limbo->vclock, replica_id, lsn);
-	struct txn_limbo_entry *e, *last_quorum = NULL;
+	struct txn_limbo_entry *e;
 	int64_t confirm_lsn = -1;
 	rlist_foreach_entry(e, &limbo->queue, in_queue) {
 		assert(e->ack_count <= VCLOCK_MAX);
@@ -416,7 +406,7 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 		 */
 		if (!txn_has_flag(e->txn, TXN_WAIT_ACK)) {
 			assert(e->lsn == -1);
-			if (last_quorum == NULL)
+			if (confirm_lsn == -1)
 				continue;
 		} else if (e->lsn <= prev_lsn) {
 			continue;
@@ -425,10 +415,8 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 		} else {
 			confirm_lsn = e->lsn;
 		}
-		e->is_commit = true;
-		last_quorum = e;
 	}
-	if (last_quorum == NULL)
+	if (confirm_lsn == -1)
 		return;
 	if (txn_limbo_write_confirm(limbo, confirm_lsn) != 0) {
 		// TODO: what to do here?.
@@ -437,16 +425,7 @@ txn_limbo_ack(struct txn_limbo *limbo, uint32_t replica_id, int64_t lsn)
 		// able to write ROLLBACK?
 		return;
 	}
-	/*
-	 * Wakeup all the entries in direct order as soon
-	 * as confirmation message is written to WAL.
-	 */
-	rlist_foreach_entry(e, &limbo->queue, in_queue) {
-		if (e->txn->fiber != fiber())
-			fiber_wakeup(e->txn->fiber);
-		if (e == last_quorum)
-			break;
-	}
+	txn_limbo_read_confirm(limbo, confirm_lsn);
 }
 
 /**
@@ -579,16 +558,20 @@ txn_limbo_on_parameters_change(struct txn_limbo *limbo)
 			confirm_lsn = e->lsn;
 			assert(confirm_lsn > 0);
 		}
-		e->is_commit = true;
 	}
-	if (confirm_lsn > 0 &&
-	    txn_limbo_write_confirm(limbo, confirm_lsn) != 0) {
-		panic("Couldn't write CONFIRM to WAL");
-		return;
+	if (confirm_lsn > 0) {
+		if (txn_limbo_write_confirm(limbo, confirm_lsn) != 0) {
+			panic("Couldn't write CONFIRM to WAL");
+			return;
+		}
+		txn_limbo_read_confirm(limbo, confirm_lsn);
 	}
 	/*
-	 * Wakeup all. Confirmed will be committed. Timed out will
-	 * rollback.
+	 * Wakeup all the others - timed out will rollback. Also
+	 * there can be non-transactional waiters, such as CONFIRM
+	 * waiters. They are bound to a transaction, but if they
+	 * wait on replica, they won't see timeout update. Because
+	 * sync transactions can live on replica infinitely.
 	 */
 	fiber_cond_broadcast(&limbo->wait_cond);
 }
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Tarantool-patches] [PATCH 2/2] txn: remove TXN_IS_DONE check from txn_commit()
  2020-07-21 22:42 [Tarantool-patches] [PATCH 0/2] Make txn_commit() simpler Vladislav Shpilevoy
  2020-07-21 22:42 ` [Tarantool-patches] [PATCH 1/2] txn_limbo: single function to confirm transactions Vladislav Shpilevoy
@ 2020-07-21 22:42 ` Vladislav Shpilevoy
  2020-07-22 15:28   ` Leonid Vasiliev
  2020-07-22  8:40 ` [Tarantool-patches] [PATCH 0/2] Make txn_commit() simpler Cyrill Gorcunov
  2020-07-22 23:14 ` Vladislav Shpilevoy
  3 siblings, 1 reply; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-21 22:42 UTC (permalink / raw)
  To: tarantool-patches, gorcunov

TXN_IS_DONE was used in txn_commit() to finalize the transaction
in a case it is not finished yet. But this happens only not in
so common cases - during bootstrap and recovery. During normal
operation the transaction is always finished when WAL thread
returns it to TX thread after a disk write.

So this is a matter of journal, and should be solved here, not in
txn code with some crutch, especially in such a hot path place.

This commit makes so that after journal_write() the transaction is
always already done, i.e. txn_complete_async() was called. Nothing
changes for the normal operation mode except this is -1 'if'.
---
 src/box/box.cc      | 36 +++++++++++++++++++++++++++---------
 src/box/journal.c   | 45 +--------------------------------------------
 src/box/journal.h   | 11 -----------
 src/box/txn.c       |  6 ++----
 src/box/wal.c       |  2 +-
 test/unit/suite.ini |  1 +
 6 files changed, 32 insertions(+), 69 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 45340df2e..83eef5d98 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -344,14 +344,6 @@ recovery_journal_write(struct journal *base,
 {
 	struct recovery_journal *journal = (struct recovery_journal *) base;
 	entry->res = vclock_sum(journal->vclock);
-	return 0;
-}
-
-static int
-recovery_journal_write_async(struct journal *base,
-			     struct  journal_entry *entry)
-{
-	recovery_journal_write(base, entry);
 	/*
 	 * Since there're no actual writes, fire a
 	 * journal_async_complete callback right away.
@@ -364,7 +356,7 @@ static void
 recovery_journal_create(struct vclock *v)
 {
 	static struct recovery_journal journal;
-	journal_create(&journal.base, recovery_journal_write_async,
+	journal_create(&journal.base, recovery_journal_write,
 		       txn_complete_async, recovery_journal_write);
 	journal.vclock = v;
 	journal_set(&journal.base);
@@ -2192,6 +2184,20 @@ engine_init()
 	box_set_vinyl_timeout();
 }
 
+/**
+ * Blindly apply whatever comes from bootstrap. This is either a
+ * local snapshot, or received from a remote master. In both cases
+ * it is not WAL - records are not sorted by their commit order,
+ * and don't have headers.
+ */
+static int
+bootstrap_journal_write(struct journal *base, struct journal_entry *entry)
+{
+	entry->res = 0;
+	journal_async_complete(base, entry);
+	return 0;
+}
+
 /**
  * Initialize the first replica of a new replica set.
  */
@@ -2620,6 +2626,15 @@ box_cfg_xc(void)
 		if (!cfg_geti("hot_standby") || checkpoint == NULL)
 			tnt_raise(ClientError, ER_ALREADY_RUNNING, cfg_gets("wal_dir"));
 	}
+
+	struct journal bootstrap_journal;
+	journal_create(&bootstrap_journal, NULL, txn_complete_async,
+		       bootstrap_journal_write);
+	journal_set(&bootstrap_journal);
+	auto bootstrap_journal_guard = make_scoped_guard([] {
+		journal_set(NULL);
+	});
+
 	bool is_bootstrap_leader = false;
 	if (checkpoint != NULL) {
 		/* Recover the instance from the local directory */
@@ -2632,6 +2647,9 @@ box_cfg_xc(void)
 	}
 	fiber_gc();
 
+	bootstrap_journal_guard.is_active = false;
+	assert(current_journal != &bootstrap_journal);
+
 	/*
 	 * Check for correct registration of the instance in _cluster
 	 * The instance won't exist in _cluster space if it is an
diff --git a/src/box/journal.c b/src/box/journal.c
index d84f960d5..f1e89aaa2 100644
--- a/src/box/journal.c
+++ b/src/box/journal.c
@@ -32,50 +32,7 @@
 #include <small/region.h>
 #include <diag.h>
 
-int
-journal_no_write_async(struct journal *journal,
-		       struct journal_entry *entry)
-{
-	(void)journal;
-	assert(false);
-
-	errno = EINVAL;
-	diag_set(SystemError, "write_async wrong context");
-
-	entry->res = -1;
-	return -1;
-}
-
-void
-journal_no_write_async_cb(struct journal_entry *entry)
-{
-	assert(false);
-
-	errno = EINVAL;
-	diag_set(SystemError, "write_async_cb wrong context");
-
-	entry->res = -1;
-}
-
-/**
- * Used to load from a memtx snapshot. LSN is not used,
- * but txn_commit() must work.
- */
-static int
-dummy_journal_write(struct journal *journal, struct journal_entry *entry)
-{
-	(void) journal;
-	entry->res = 0;
-	return 0;
-}
-
-static struct journal dummy_journal = {
-	.write_async	= journal_no_write_async,
-	.write_async_cb	= journal_no_write_async_cb,
-	.write		= dummy_journal_write,
-};
-
-struct journal *current_journal = &dummy_journal;
+struct journal *current_journal = NULL;
 
 struct journal_entry *
 journal_entry_new(size_t n_rows, struct region *region,
diff --git a/src/box/journal.h b/src/box/journal.h
index ebc5cb708..1a10e66c3 100644
--- a/src/box/journal.h
+++ b/src/box/journal.h
@@ -182,17 +182,6 @@ journal_create(struct journal *journal,
 	journal->write		= write;
 }
 
-/**
- * A stub to issue an error in case if asynchronous
- * write is diabled in the backend.
- */
-extern int
-journal_no_write_async(struct journal *journal,
-		       struct journal_entry *entry);
-
-extern void
-journal_no_write_async_cb(struct journal_entry *entry);
-
 static inline bool
 journal_is_initialized(struct journal *journal)
 {
diff --git a/src/box/txn.c b/src/box/txn.c
index bd9fcdbe6..9c21258c5 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -450,6 +450,7 @@ txn_run_wal_write_triggers(struct txn *txn)
 void
 txn_complete(struct txn *txn)
 {
+	assert(!txn_has_flag(txn, TXN_IS_DONE));
 	/*
 	 * Note, engine can be NULL if transaction contains
 	 * IPROTO_NOP or IPROTO_CONFIRM statements.
@@ -846,10 +847,7 @@ txn_commit(struct txn *txn)
 		if (txn_limbo_wait_complete(&txn_limbo, limbo_entry) < 0)
 			goto rollback;
 	}
-	if (!txn_has_flag(txn, TXN_IS_DONE)) {
-		txn->signature = req->res;
-		txn_complete(txn);
-	}
+	assert(txn_has_flag(txn, TXN_IS_DONE));
 	assert(txn->signature >= 0);
 
 	/* Synchronous transactions are freed by the calling fiber. */
diff --git a/src/box/wal.c b/src/box/wal.c
index 8cedf65b7..37a8bd483 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -1304,7 +1304,7 @@ wal_write_none_async(struct journal *journal,
 	vclock_merge(&writer->vclock, &vclock_diff);
 	vclock_copy(&replicaset.vclock, &writer->vclock);
 	entry->res = vclock_sum(&writer->vclock);
-
+	journal_async_complete(journal, entry);
 	return 0;
 }
 
diff --git a/test/unit/suite.ini b/test/unit/suite.ini
index d429c95e9..d16ba413b 100644
--- a/test/unit/suite.ini
+++ b/test/unit/suite.ini
@@ -1,5 +1,6 @@
 [default]
 core = unittest
 description = unit tests
+disabled = snap_quorum_delay.test
 release_disabled = fiber_stack.test swim_errinj.test
 is_parallel = True
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/2] Make txn_commit() simpler
  2020-07-21 22:42 [Tarantool-patches] [PATCH 0/2] Make txn_commit() simpler Vladislav Shpilevoy
  2020-07-21 22:42 ` [Tarantool-patches] [PATCH 1/2] txn_limbo: single function to confirm transactions Vladislav Shpilevoy
  2020-07-21 22:42 ` [Tarantool-patches] [PATCH 2/2] txn: remove TXN_IS_DONE check from txn_commit() Vladislav Shpilevoy
@ 2020-07-22  8:40 ` Cyrill Gorcunov
  2020-07-22 23:14 ` Vladislav Shpilevoy
  3 siblings, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2020-07-22  8:40 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On Wed, Jul 22, 2020 at 12:42:42AM +0200, Vladislav Shpilevoy wrote:
> The patch makes so journal_write() always ends with the
> transaction completed, with TXN_IS_DONE flag installed. That
> allows to remove 1 'if' from hot txn_commit() function, and make
> journal API more straightforward.
> 
> This change also motivated rework of how limbo commits
> transactions.
> 
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/txn_commit-opt
Ack to both

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/2] txn_limbo: single function to confirm transactions
  2020-07-21 22:42 ` [Tarantool-patches] [PATCH 1/2] txn_limbo: single function to confirm transactions Vladislav Shpilevoy
@ 2020-07-22 15:19   ` Leonid Vasiliev
  0 siblings, 0 replies; 7+ messages in thread
From: Leonid Vasiliev @ 2020-07-22 15:19 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, gorcunov

Hi! Thank you for the patch.
LGTM.

On 22.07.2020 01:42, Vladislav Shpilevoy wrote:
> Before this patch there were some structural problems with the
> limbo code (still are now, but less).
> 
> It wasn't clear when lifetime of a transaction ends. When it it is

Double it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/2] txn: remove TXN_IS_DONE check from txn_commit()
  2020-07-21 22:42 ` [Tarantool-patches] [PATCH 2/2] txn: remove TXN_IS_DONE check from txn_commit() Vladislav Shpilevoy
@ 2020-07-22 15:28   ` Leonid Vasiliev
  0 siblings, 0 replies; 7+ messages in thread
From: Leonid Vasiliev @ 2020-07-22 15:28 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, gorcunov

Hi! Thank you for the patch.
LGTM.


> diff --git a/test/unit/suite.ini b/test/unit/suite.ini
> index d429c95e9..d16ba413b 100644
> --- a/test/unit/suite.ini
> +++ b/test/unit/suite.ini
> @@ -1,5 +1,6 @@
>   [default]
>   core = unittest
>   description = unit tests
> +disabled = snap_quorum_delay.test

Discussed the test with Vlad. I will write my point of view in
https://github.com/tarantool/tarantool/issues/5193

>   release_disabled = fiber_stack.test swim_errinj.test
>   is_parallel = True
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/2] Make txn_commit() simpler
  2020-07-21 22:42 [Tarantool-patches] [PATCH 0/2] Make txn_commit() simpler Vladislav Shpilevoy
                   ` (2 preceding siblings ...)
  2020-07-22  8:40 ` [Tarantool-patches] [PATCH 0/2] Make txn_commit() simpler Cyrill Gorcunov
@ 2020-07-22 23:14 ` Vladislav Shpilevoy
  3 siblings, 0 replies; 7+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-22 23:14 UTC (permalink / raw)
  To: tarantool-patches, gorcunov

The patches are pushed to 2.5 and master.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-22 23:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21 22:42 [Tarantool-patches] [PATCH 0/2] Make txn_commit() simpler Vladislav Shpilevoy
2020-07-21 22:42 ` [Tarantool-patches] [PATCH 1/2] txn_limbo: single function to confirm transactions Vladislav Shpilevoy
2020-07-22 15:19   ` Leonid Vasiliev
2020-07-21 22:42 ` [Tarantool-patches] [PATCH 2/2] txn: remove TXN_IS_DONE check from txn_commit() Vladislav Shpilevoy
2020-07-22 15:28   ` Leonid Vasiliev
2020-07-22  8:40 ` [Tarantool-patches] [PATCH 0/2] Make txn_commit() simpler Cyrill Gorcunov
2020-07-22 23:14 ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox