Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write
@ 2020-10-31 18:01 Vladislav Shpilevoy
  2020-10-31 18:01 ` [Tarantool-patches] [PATCH 1/3] txn: rename txn_complete_async to txn_on_journal_write Vladislav Shpilevoy
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-31 18:01 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

The patchset fixes the issue with synchronous transactions
printing "too long WAL write" even when WAL write speed was fine.

The reason was the the "WAL write" time is measured at transaction
commit, which is far from the actual disk write for synchronous
transactions.

@ChangeLog
* Fixed a false-positive "too long WAL write" message for synchronous transactions.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5139-too-long-wal
Issue: https://github.com/tarantool/tarantool/issues/5139

Vladislav Shpilevoy (3):
  txn: rename txn_complete_async to txn_on_journal_write
  txn: split complete into success and fail paths
  txn: warn "too long WAL" on write, not on commit

 src/box/txn.c       | 127 +++++++++++++++++++++-----------------------
 src/box/txn.h       |  19 ++++---
 src/box/txn_limbo.c |  37 +++++++++----
 3 files changed, 97 insertions(+), 86 deletions(-)

-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Tarantool-patches] [PATCH 1/3] txn: rename txn_complete_async to txn_on_journal_write
  2020-10-31 18:01 [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write Vladislav Shpilevoy
@ 2020-10-31 18:01 ` Vladislav Shpilevoy
  2020-11-02 11:48   ` Cyrill Gorcunov
  2020-10-31 18:01 ` [Tarantool-patches] [PATCH 2/3] txn: split complete into success and fail paths Vladislav Shpilevoy
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-31 18:01 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

The function is called only by the journal when write is finished.

Besides, it may not complete the transaction. In case of
synchronous replication it is not enough for completion. It means,
it can't have 'complete' in its name.

Also the function is never used out of txn.c, so it is removed
from txn.h and is now static.

The patch is a preparation for not spaming "too long WAL write" on
synchronous transactions, because it is simply misleading.

Part of #5139
---
 src/box/txn.c       |  7 ++++---
 src/box/txn.h       |  6 ------
 src/box/txn_limbo.c | 29 ++++++++++++++++++++++-------
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/src/box/txn.c b/src/box/txn.c
index eb725aaa9..8aedcc9e4 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -580,8 +580,9 @@ txn_complete(struct txn *txn)
 	}
 }
 
-void
-txn_complete_async(struct journal_entry *entry)
+/** Callback invoked when the transaction's journal write is finished. */
+static void
+txn_on_journal_write(struct journal_entry *entry)
 {
 	struct txn *txn = entry->complete_data;
 	/*
@@ -614,7 +615,7 @@ txn_journal_entry_new(struct txn *txn)
 
 	/* Save space for an additional NOP row just in case. */
 	req = journal_entry_new(txn->n_new_rows + txn->n_applier_rows + 1,
-				&txn->region, txn_complete_async, txn);
+				&txn->region, txn_on_journal_write, txn);
 	if (req == NULL)
 		return NULL;
 
diff --git a/src/box/txn.h b/src/box/txn.h
index a51bdf8e6..ffe240867 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -441,12 +441,6 @@ txn_commit(struct txn *txn);
 void
 txn_rollback(struct txn *txn);
 
-/**
- * Complete asynchronous transaction.
- */
-void
-txn_complete_async(struct journal_entry *entry);
-
 /**
  * Submit a transaction to the journal.
  * @pre txn == in_txn()
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 908a17fcc..06a3d30c3 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -373,10 +373,17 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
 		txn_clear_flag(e->txn, TXN_WAIT_SYNC);
 		txn_clear_flag(e->txn, TXN_WAIT_ACK);
 		/*
-		 * If  txn_complete_async() was already called,
-		 * finish tx processing. Otherwise just clear the
-		 * "WAIT_ACK" flag. Tx procesing will finish once
-		 * the tx is written to WAL.
+		 * If already written to WAL by now, finish tx processing.
+		 * Otherwise just clear the sync flags. Tx procesing will finish
+		 * automatically once the tx is written to WAL.
+		 *
+		 * XXX: Normally at this point all transactions covered by this
+		 * CONFIRM should be in WAL already, but there is a bug, that
+		 * replica always processes received synchro requests *before*
+		 * writing them to WAL. So it can happen, that a CONFIRM is
+		 * 'read', but the transaction is not written yet. Should be
+		 * fixed when the replica will behave properly, and then this
+		 * branch won't exist.
 		 */
 		if (e->txn->signature >= 0)
 			txn_complete(e->txn);
@@ -424,9 +431,17 @@ txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
 			txn_complete(e->txn);
 		} else {
 			/*
-			 * Rollback the transaction, but don't
-			 * free it yet. txn_complete_async() will
-			 * free it.
+			 * Rollback the transaction, but don't free it yet. It
+			 * will be freed after its WAL write is completed.
+			 *
+			 * XXX: Normally at this point all transactions covered
+			 * by this ROLLBACK should be in WAL already, but there
+			 * is a bug, that replica always processes received
+			 * synchro requests *before* writing them to WAL. So it
+			 * can happen, that a ROLLBACK is 'read', but the
+			 * transaction is not written yet. Should be fixed when
+			 * the replica will behave properly, and then this
+			 * branch won't exist.
 			 */
 			e->txn->signature = TXN_SIGNATURE_SYNC_ROLLBACK;
 			struct fiber *fiber = e->txn->fiber;
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Tarantool-patches] [PATCH 2/3] txn: split complete into success and fail paths
  2020-10-31 18:01 [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write Vladislav Shpilevoy
  2020-10-31 18:01 ` [Tarantool-patches] [PATCH 1/3] txn: rename txn_complete_async to txn_on_journal_write Vladislav Shpilevoy
@ 2020-10-31 18:01 ` Vladislav Shpilevoy
  2020-11-02 12:15   ` Cyrill Gorcunov
  2020-10-31 18:01 ` [Tarantool-patches] [PATCH 3/3] txn: warn "too long WAL" on write, not on commit Vladislav Shpilevoy
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-31 18:01 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

txn_complete used to handle all the transaction outcomes:
- manual rollback;
- error at WAL write;
- successful WAL write and commit;
- successful WAL write and wait for synchronization with replicas.

The code became a mess after synchronous replication was
introduced. This patch splits txn_complete's code into multiple
pieces.

Now WAL write success and fail are handled by
txn_on_journal_write() exclusively. It also runs the WAL write
triggers. It was very strange to call them from txn_complete().

txn_on_journal_write() also checks if the transaction is
synchronous, and if it is not, it completes it with
txn_complete_success() whose code is simple now, and only works
on committing the changes.

In case of fail the transaction always ends up in
txn_complete_fail().

These success and fail functions are now used by the limbo as
well. It appeared all the places finishing a transaction always
know if they want to fail it or complete successfully.

This should also remove a few ifs from the hot code of transaction
commit.

The patch simplifies the code in order to fix the false warning
about too long WAL write for synchronous transactions, which is
printed not at WAL write now, but at commit. These two events are
far from each other for synchro requests.

Part of #5139
---
 src/box/txn.c       | 122 +++++++++++++++++++++-----------------------
 src/box/txn.h       |  13 ++++-
 src/box/txn_limbo.c |   8 +--
 3 files changed, 72 insertions(+), 71 deletions(-)

diff --git a/src/box/txn.c b/src/box/txn.c
index 8aedcc9e4..fd77a934a 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -506,70 +506,13 @@ txn_run_wal_write_triggers(struct txn *txn)
 }
 
 /**
- * Complete transaction processing.
+ * If there is no fiber waiting for the transaction then the
+ * transaction could be safely freed. In the opposite case the
+ * owner fiber is in duty to free this transaction.
  */
-void
-txn_complete(struct txn *txn)
+static void
+txn_free_or_wakeup(struct txn *txn)
 {
-	assert(!txn_has_flag(txn, TXN_IS_DONE));
-	/*
-	 * Note, engine can be NULL if transaction contains
-	 * IPROTO_NOP or IPROTO_CONFIRM statements.
-	 */
-	if (txn->signature < 0) {
-		txn->status = TXN_ABORTED;
-		/* Undo the transaction. */
-		struct txn_stmt *stmt;
-		stailq_reverse(&txn->stmts);
-		stailq_foreach_entry(stmt, &txn->stmts, next)
-			txn_rollback_one_stmt(txn, stmt);
-		if (txn->engine)
-			engine_rollback(txn->engine, txn);
-		if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
-			txn_run_rollback_triggers(txn, &txn->on_rollback);
-	} else if (!txn_has_flag(txn, TXN_WAIT_SYNC)) {
-		txn->status = TXN_COMMITTED;
-		/* Commit the transaction. */
-		if (txn->engine != NULL)
-			engine_commit(txn->engine, txn);
-		if (txn_has_flag(txn, TXN_HAS_TRIGGERS)) {
-			txn_run_commit_triggers(txn, &txn->on_commit);
-			/*
-			 * For async transactions WAL write ==
-			 * commit.
-			 */
-			txn_run_wal_write_triggers(txn);
-		}
-
-		double stop_tm = ev_monotonic_now(loop());
-		if (stop_tm - txn->start_tm > too_long_threshold) {
-			int n_rows = txn->n_new_rows + txn->n_applier_rows;
-			say_warn_ratelimited("too long WAL write: %d rows at "
-					     "LSN %lld: %.3f sec", n_rows,
-					     txn->signature - n_rows + 1,
-					     stop_tm - txn->start_tm);
-		}
-	} else {
-		if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
-			txn_run_wal_write_triggers(txn);
-		/*
-		 * Complete is called on every WAL operation
-		 * authored by this transaction. And it not always
-		 * is one. And not always is enough for commit.
-		 * In case the transaction is waiting for acks, it
-		 * can't be committed right away. Give control
-		 * back to the fiber, owning the transaction so as
-		 * it could decide what to do next.
-		 */
-		if (txn->fiber != NULL && txn->fiber != fiber())
-			fiber_wakeup(txn->fiber);
-		return;
-	}
-	/*
-	 * If there is no fiber waiting for the transaction then
-	 * the transaction could be safely freed. In the opposite case
-	 * the fiber is in duty to free this transaction.
-	 */
 	if (txn->fiber == NULL)
 		txn_free(txn);
 	else {
@@ -580,6 +523,45 @@ txn_complete(struct txn *txn)
 	}
 }
 
+void
+txn_complete_fail(struct txn *txn)
+{
+	assert(!txn_has_flag(txn, TXN_IS_DONE));
+	assert(txn->signature < 0);
+	txn->status = TXN_ABORTED;
+	struct txn_stmt *stmt;
+	stailq_reverse(&txn->stmts);
+	stailq_foreach_entry(stmt, &txn->stmts, next)
+		txn_rollback_one_stmt(txn, stmt);
+	if (txn->engine)
+		engine_rollback(txn->engine, txn);
+	if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
+		txn_run_rollback_triggers(txn, &txn->on_rollback);
+	txn_free_or_wakeup(txn);
+}
+
+void
+txn_complete_success(struct txn *txn)
+{
+	double stop_tm = ev_monotonic_now(loop());
+	if (stop_tm - txn->start_tm > too_long_threshold) {
+		int n_rows = txn->n_new_rows + txn->n_applier_rows;
+		say_warn_ratelimited("too long WAL write: %d rows at LSN %lld: "
+				     "%.3f sec", n_rows,
+				     txn->signature - n_rows + 1,
+				     stop_tm - txn->start_tm);
+	}
+	assert(!txn_has_flag(txn, TXN_IS_DONE));
+	assert(!txn_has_flag(txn, TXN_WAIT_SYNC));
+	assert(txn->signature >= 0);
+	txn->status = TXN_COMMITTED;
+	if (txn->engine != NULL)
+		engine_commit(txn->engine, txn);
+	if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
+		txn_run_commit_triggers(txn, &txn->on_commit);
+	txn_free_or_wakeup(txn);
+}
+
 /** Callback invoked when the transaction's journal write is finished. */
 static void
 txn_on_journal_write(struct journal_entry *entry)
@@ -601,7 +583,17 @@ txn_on_journal_write(struct journal_entry *entry)
 	 */
 	assert(in_txn() == NULL);
 	fiber_set_txn(fiber(), txn);
-	txn_complete(txn);
+	if (txn->signature < 0) {
+		txn_complete_fail(txn);
+		goto finish;
+	}
+	if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
+		txn_run_wal_write_triggers(txn);
+	if (!txn_has_flag(txn, TXN_WAIT_SYNC))
+		txn_complete_success(txn);
+	else if (txn->fiber != NULL && txn->fiber != fiber())
+		fiber_wakeup(txn->fiber);
+finish:
 	fiber_set_txn(fiber(), NULL);
 }
 
@@ -772,7 +764,7 @@ txn_commit_nop(struct txn *txn)
 {
 	if (txn->n_new_rows + txn->n_applier_rows == 0) {
 		txn->signature = TXN_SIGNATURE_NOP;
-		txn_complete(txn);
+		txn_complete_success(txn);
 		fiber_set_txn(fiber(), NULL);
 		return true;
 	}
@@ -982,7 +974,7 @@ txn_rollback(struct txn *txn)
 	if (!txn_has_flag(txn, TXN_CAN_YIELD))
 		trigger_clear(&txn->fiber_on_yield);
 	txn->signature = TXN_SIGNATURE_ROLLBACK;
-	txn_complete(txn);
+	txn_complete_fail(txn);
 	fiber_set_txn(fiber(), NULL);
 }
 
diff --git a/src/box/txn.h b/src/box/txn.h
index ffe240867..dfa7c0ee4 100644
--- a/src/box/txn.h
+++ b/src/box/txn.h
@@ -419,10 +419,19 @@ struct txn *
 txn_begin(void);
 
 /**
- * Complete transaction processing.
+ * Complete transaction processing with an error. All the changes are going to
+ * be rolled back. The transaction's signature should be set to the rollback
+ * reason.
  */
 void
-txn_complete(struct txn *txn);
+txn_complete_fail(struct txn *txn);
+
+/**
+ * Complete transaction processing successfully. All the changes are going to
+ * become committed and visible.
+ */
+void
+txn_complete_success(struct txn *txn);
 
 /**
  * Commit a transaction.
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 06a3d30c3..c94678885 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -240,7 +240,7 @@ txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry)
 		txn_limbo_abort(limbo, e);
 		txn_clear_flag(e->txn, TXN_WAIT_SYNC);
 		txn_clear_flag(e->txn, TXN_WAIT_ACK);
-		txn_complete(e->txn);
+		txn_complete_fail(e->txn);
 		if (e == entry)
 			break;
 		fiber_wakeup(e->txn->fiber);
@@ -386,7 +386,7 @@ txn_limbo_read_confirm(struct txn_limbo *limbo, int64_t lsn)
 		 * branch won't exist.
 		 */
 		if (e->txn->signature >= 0)
-			txn_complete(e->txn);
+			txn_complete_success(e->txn);
 	}
 }
 
@@ -428,7 +428,7 @@ txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
 		if (e->txn->signature >= 0) {
 			/* Rollback the transaction. */
 			e->txn->signature = TXN_SIGNATURE_SYNC_ROLLBACK;
-			txn_complete(e->txn);
+			txn_complete_fail(e->txn);
 		} else {
 			/*
 			 * Rollback the transaction, but don't free it yet. It
@@ -446,7 +446,7 @@ txn_limbo_read_rollback(struct txn_limbo *limbo, int64_t lsn)
 			e->txn->signature = TXN_SIGNATURE_SYNC_ROLLBACK;
 			struct fiber *fiber = e->txn->fiber;
 			e->txn->fiber = fiber();
-			txn_complete(e->txn);
+			txn_complete_fail(e->txn);
 			e->txn->fiber = fiber;
 		}
 		if (e == last_rollback)
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Tarantool-patches] [PATCH 3/3] txn: warn "too long WAL" on write, not on commit
  2020-10-31 18:01 [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write Vladislav Shpilevoy
  2020-10-31 18:01 ` [Tarantool-patches] [PATCH 1/3] txn: rename txn_complete_async to txn_on_journal_write Vladislav Shpilevoy
  2020-10-31 18:01 ` [Tarantool-patches] [PATCH 2/3] txn: split complete into success and fail paths Vladislav Shpilevoy
@ 2020-10-31 18:01 ` Vladislav Shpilevoy
  2020-11-02 12:31   ` Cyrill Gorcunov
  2020-11-03  7:36 ` [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write Serge Petrenko
  2020-11-03 22:19 ` Vladislav Shpilevoy
  4 siblings, 1 reply; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-31 18:01 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

"Too long WAL write" is supposed to warn a user that either the
disk write was too long, or the event loop is too slow, maybe due
to certain fibers not doing yields often enough.

It was printed by the code doing the transaction commit. As a
result, for synchronous transactions the check also included the
replication time, often overflowing a threshold and printing
"too long WAL write" even when it had nothing to do with a WAL
write or the event loop being too slow.

The patch makes so the warning is checked and printed after WAL
write right away, not after commit.

Closes #5139
---
 src/box/txn.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/box/txn.c b/src/box/txn.c
index fd77a934a..c26116f1d 100644
--- a/src/box/txn.c
+++ b/src/box/txn.c
@@ -543,14 +543,6 @@ txn_complete_fail(struct txn *txn)
 void
 txn_complete_success(struct txn *txn)
 {
-	double stop_tm = ev_monotonic_now(loop());
-	if (stop_tm - txn->start_tm > too_long_threshold) {
-		int n_rows = txn->n_new_rows + txn->n_applier_rows;
-		say_warn_ratelimited("too long WAL write: %d rows at LSN %lld: "
-				     "%.3f sec", n_rows,
-				     txn->signature - n_rows + 1,
-				     stop_tm - txn->start_tm);
-	}
 	assert(!txn_has_flag(txn, TXN_IS_DONE));
 	assert(!txn_has_flag(txn, TXN_WAIT_SYNC));
 	assert(txn->signature >= 0);
@@ -587,6 +579,14 @@ txn_on_journal_write(struct journal_entry *entry)
 		txn_complete_fail(txn);
 		goto finish;
 	}
+	double stop_tm = ev_monotonic_now(loop());
+	if (stop_tm - txn->start_tm > too_long_threshold) {
+		int n_rows = txn->n_new_rows + txn->n_applier_rows;
+		say_warn_ratelimited("too long WAL write: %d rows at "
+				     "LSN %lld: %.3f sec", n_rows,
+				     txn->signature - n_rows + 1,
+				     stop_tm - txn->start_tm);
+	}
 	if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
 		txn_run_wal_write_triggers(txn);
 	if (!txn_has_flag(txn, TXN_WAIT_SYNC))
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/3] txn: rename txn_complete_async to txn_on_journal_write
  2020-10-31 18:01 ` [Tarantool-patches] [PATCH 1/3] txn: rename txn_complete_async to txn_on_journal_write Vladislav Shpilevoy
@ 2020-11-02 11:48   ` Cyrill Gorcunov
  0 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-11-02 11:48 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On Sat, Oct 31, 2020 at 07:01:40PM +0100, Vladislav Shpilevoy wrote:
> The function is called only by the journal when write is finished.
> 
> Besides, it may not complete the transaction. In case of
> synchronous replication it is not enough for completion. It means,
> it can't have 'complete' in its name.
> 
> Also the function is never used out of txn.c, so it is removed
> from txn.h and is now static.
> 
> The patch is a preparation for not spaming "too long WAL write" on
> synchronous transactions, because it is simply misleading.
> 
> Part of #5139
Ack

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/3] txn: split complete into success and fail paths
  2020-10-31 18:01 ` [Tarantool-patches] [PATCH 2/3] txn: split complete into success and fail paths Vladislav Shpilevoy
@ 2020-11-02 12:15   ` Cyrill Gorcunov
  2020-11-02 22:39     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-11-02 12:15 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On Sat, Oct 31, 2020 at 07:01:41PM +0100, Vladislav Shpilevoy wrote:
> txn_complete used to handle all the transaction outcomes:
> - manual rollback;
> - error at WAL write;
> - successful WAL write and commit;
> - successful WAL write and wait for synchronization with replicas.

Looks ok to me. Ack.

Here is a diff on top of the patch I would add, but up to you, feel free to ignore.
---
 src/box/txn.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: tarantool.git/src/box/txn.c
===================================================================
--- tarantool.git.orig/src/box/txn.c
+++ tarantool.git/src/box/txn.c
@@ -533,7 +533,7 @@ txn_complete_fail(struct txn *txn)
 	stailq_reverse(&txn->stmts);
 	stailq_foreach_entry(stmt, &txn->stmts, next)
 		txn_rollback_one_stmt(txn, stmt);
-	if (txn->engine)
+	if (txn->engine != NULL)
 		engine_rollback(txn->engine, txn);
 	if (txn_has_flag(txn, TXN_HAS_TRIGGERS))
 		txn_run_rollback_triggers(txn, &txn->on_rollback);
@@ -544,12 +544,12 @@ void
 txn_complete_success(struct txn *txn)
 {
 	double stop_tm = ev_monotonic_now(loop());
-	if (stop_tm - txn->start_tm > too_long_threshold) {
+	double delta = stop_tm - txn->start_tm;
+	if (delta > too_long_threshold) {
 		int n_rows = txn->n_new_rows + txn->n_applier_rows;
 		say_warn_ratelimited("too long WAL write: %d rows at LSN %lld: "
 				     "%.3f sec", n_rows,
-				     txn->signature - n_rows + 1,
-				     stop_tm - txn->start_tm);
+				     txn->signature - n_rows + 1, delta);
 	}
 	assert(!txn_has_flag(txn, TXN_IS_DONE));
 	assert(!txn_has_flag(txn, TXN_WAIT_SYNC));

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 3/3] txn: warn "too long WAL" on write, not on commit
  2020-10-31 18:01 ` [Tarantool-patches] [PATCH 3/3] txn: warn "too long WAL" on write, not on commit Vladislav Shpilevoy
@ 2020-11-02 12:31   ` Cyrill Gorcunov
  0 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2020-11-02 12:31 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On Sat, Oct 31, 2020 at 07:01:42PM +0100, Vladislav Shpilevoy wrote:
> "Too long WAL write" is supposed to warn a user that either the
> disk write was too long, or the event loop is too slow, maybe due
> to certain fibers not doing yields often enough.
Ack

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/3] txn: split complete into success and fail paths
  2020-11-02 12:15   ` Cyrill Gorcunov
@ 2020-11-02 22:39     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-02 22:39 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tarantool-patches

Hi! Thanks for the review!

On 02.11.2020 13:15, Cyrill Gorcunov wrote:
> On Sat, Oct 31, 2020 at 07:01:41PM +0100, Vladislav Shpilevoy wrote:
>> txn_complete used to handle all the transaction outcomes:
>> - manual rollback;
>> - error at WAL write;
>> - successful WAL write and commit;
>> - successful WAL write and wait for synchronization with replicas.
> 
> Looks ok to me. Ack.
> 
> Here is a diff on top of the patch I would add, but up to you, feel free to ignore.

Applied and force pushed on the branch.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write
  2020-10-31 18:01 [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write Vladislav Shpilevoy
                   ` (2 preceding siblings ...)
  2020-10-31 18:01 ` [Tarantool-patches] [PATCH 3/3] txn: warn "too long WAL" on write, not on commit Vladislav Shpilevoy
@ 2020-11-03  7:36 ` Serge Petrenko
  2020-11-03 22:19 ` Vladislav Shpilevoy
  4 siblings, 0 replies; 10+ messages in thread
From: Serge Petrenko @ 2020-11-03  7:36 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, gorcunov


31.10.2020 21:01, Vladislav Shpilevoy пишет:
> The patchset fixes the issue with synchronous transactions
> printing "too long WAL write" even when WAL write speed was fine.
>
> The reason was the the "WAL write" time is measured at transaction
> commit, which is far from the actual disk write for synchronous
> transactions.
>
> @ChangeLog
> * Fixed a false-positive "too long WAL write" message for synchronous transactions.
>
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5139-too-long-wal
> Issue: https://github.com/tarantool/tarantool/issues/5139
>
> Vladislav Shpilevoy (3):
>    txn: rename txn_complete_async to txn_on_journal_write
>    txn: split complete into success and fail paths
>    txn: warn "too long WAL" on write, not on commit
>
>   src/box/txn.c       | 127 +++++++++++++++++++++-----------------------
>   src/box/txn.h       |  19 ++++---
>   src/box/txn_limbo.c |  37 +++++++++----
>   3 files changed, 97 insertions(+), 86 deletions(-)
Hi! Thanks for the patchset! LGTM.
>
-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write
  2020-10-31 18:01 [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write Vladislav Shpilevoy
                   ` (3 preceding siblings ...)
  2020-11-03  7:36 ` [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write Serge Petrenko
@ 2020-11-03 22:19 ` Vladislav Shpilevoy
  4 siblings, 0 replies; 10+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-03 22:19 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko, gorcunov

Pushed to 2.5, 2.6, and master.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-11-03 22:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-31 18:01 [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write Vladislav Shpilevoy
2020-10-31 18:01 ` [Tarantool-patches] [PATCH 1/3] txn: rename txn_complete_async to txn_on_journal_write Vladislav Shpilevoy
2020-11-02 11:48   ` Cyrill Gorcunov
2020-10-31 18:01 ` [Tarantool-patches] [PATCH 2/3] txn: split complete into success and fail paths Vladislav Shpilevoy
2020-11-02 12:15   ` Cyrill Gorcunov
2020-11-02 22:39     ` Vladislav Shpilevoy
2020-10-31 18:01 ` [Tarantool-patches] [PATCH 3/3] txn: warn "too long WAL" on write, not on commit Vladislav Shpilevoy
2020-11-02 12:31   ` Cyrill Gorcunov
2020-11-03  7:36 ` [Tarantool-patches] [PATCH 0/3] Qsync too long WAL write Serge Petrenko
2020-11-03 22:19 ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox