[Tarantool-patches] [PATCH 2/3] recovery: make it transactional

Serge Petrenko sergepetrenko at tarantool.org
Fri Apr 2 14:47:23 MSK 2021



02.04.2021 01:23, Vladislav Shpilevoy пишет:
> Recovery used to be performed row by row. It was fine because
> anyway all the persisted rows are supposed to be committed, and
> should not meet any problems during recovery so a transaction
> could be applied partially.
>
> But it became not true after the synchronous replication
> introduction. Synchronous transactions might be in the log, but
> can be followed by a ROLLBACK record which is supposed to delete
> them.
>
> During row-by-row recovery, firstly, the synchro rows each turned
> into a sync transaction. Which is probably fine. But the rows on
> non-sync spaces which were a part of a sync transaction, could be
> applied right away bypassing the limbo leading to all kind of the
> sweet errors like duplicate keys, or inconsistency of a partially
> applied transaction.
>
> The patch makes the recovery transactional. Either an entire
> transaction is recovered, or it is rolled back which normally
> happens only for synchro transactions followed by ROLLBACK.
>
> In force recovery of a broken log the consistency is not
> guaranteed though.
>
> Closes #5874

Thanks for the patch!

I ran the tests locally and replication suite fails occasionally with
```
[009] 2021-04-02 14:28:32.925 [30257] main/103/master box.cc:539 E> 
error at request: {type: 'INSERT', replica_id: 0, lsn: 7, space_id: 517, 
index_id: 0, tuple: [21]}
[009] 2021-04-02 14:28:32.925 [30257] main/103/master box.cc:481 E> 
XlogError: found a first row in a transaction with LSN/TSN mismatch
[009] 2021-04-02 14:28:32.925 [30257] main/103/master F> can't 
initialize storage: found a first row in a transaction with LSN/TSN mismatch
[009] 2021-04-02 14:28:32.925 [30257] main/103/master F> can't 
initialize storage: found a first row in a transaction with LSN/TSN mismatch
```

> ---
>   .../unreleased/qsync-multi-statement-recovery |   5 +
>   src/box/box.cc                                | 261 ++++++++++++++++--
>   .../gh-5874-qsync-txn-recovery.result         | 124 +++++++++
>   .../gh-5874-qsync-txn-recovery.test.lua       |  64 +++++
>   4 files changed, 427 insertions(+), 27 deletions(-)
>   create mode 100644 changelogs/unreleased/qsync-multi-statement-recovery
>   create mode 100644 test/replication/gh-5874-qsync-txn-recovery.result
>   create mode 100644 test/replication/gh-5874-qsync-txn-recovery.test.lua
>
...
> +/**
> + * Rows of the same transaction are wrapped into begin/commit. Mostly for the
> + * sake of synchronous replication, when the log can contain rolled back
> + * transactions, which must be entirely reverted during recovery when ROLLBACK
> + * records are met. Row-by-row recovery wouldn't work for multi-statement
> + * synchronous transactions.
> + */
> +static int
> +wal_stream_apply_dml_row(struct wal_stream *stream, struct xrow_header *row)
>   {
>   	struct request request;
> -	if (iproto_type_is_synchro_request(row->type)) {
> -		struct synchro_request syn_req;
> -		if (xrow_decode_synchro(row, &syn_req) != 0)
> -			diag_raise();
> -		txn_limbo_process(&txn_limbo, &syn_req);
> -		return;
> +	uint64_t req_type = dml_request_key_map(row->type);
> +	if (xrow_decode_dml(row, &request, req_type) != 0) {
> +		say_error("couldn't decode a DML request");
> +		return -1;
>   	}
> -	if (iproto_type_is_raft_request(row->type)) {
> -		struct raft_request raft_req;
> -		/* Vclock is never persisted in WAL by Raft. */
> -		if (xrow_decode_raft(row, &raft_req, NULL) != 0)
> -			diag_raise();
> -		box_raft_recover(&raft_req);
> -		return;
> +	/*
> +	 * Note that all the information which came from the log is validated
> +	 * and the errors are handled. Not asserted or paniced. That is for the
> +	 * sake of force recovery, which must be able to recover just everything
> +	 * what possible instead of terminating the instance.
> +	 */
> +	struct txn *txn;
> +	if (stream->tsn == 0) {
> +		if (row->tsn == 0) {
> +			diag_set(XlogError, "found a row without TSN");
> +			goto end_diag_request;
> +		}
> +		if (row->tsn != row->lsn) {
> +			diag_set(XlogError, "found a first row in a "
> +				 "transaction with LSN/TSN mismatch");
> +			goto end_diag_request;
> +		}
> +		stream->tsn = row->tsn;
> +		/*
> +		 * Rows are not stacked into a list like during replication,
> +		 * because recovery does not yield while reading the rows. All
> +		 * the yields are controlled by the stream, and therefore no
> +		 * need to wait for all the rows to start a transaction. Can
> +		 * start now, apply the rows, and make a yield after commit if
> +		 * necessary. Helps to avoid a lot of copying.
> +		 */

Nice solution!

> +		txn = txn_begin();
> +		if (txn == NULL) {
> +			say_error("couldn't begin a recovery transaction");
> +			return -1;
> +		}
> +	} else if (row->tsn != stream->tsn) {
> +		diag_set(XlogError, "found a next transaction with the "
> +			 "previous one not yet committed");
> +		goto end_diag_request;
> +	} else {
> +		txn = in_txn();
>   	}
> -	xrow_decode_dml_xc(row, &request, dml_request_key_map(row->type));
> +	assert(wal_stream_has_tx(stream));
> +	/* Nops might appear at least after before_replace skipping rows. */
>   	if (request.type != IPROTO_NOP) {
> -		struct space *space = space_cache_find_xc(request.space_id);
> +		struct space *space = space_cache_find(request.space_id);
> +		if (space == NULL) {
> +			say_error("couldn't find space by ID");
> +			goto end_diag_request;
> +		}
>   		if (box_process_rw(&request, space, NULL) != 0) {
> -			say_error("error applying row: %s", request_str(&request));
> -			diag_raise();
> +			say_error("couldn't apply the request");
> +			goto end_diag_request;
>   		}
>   	}
> -	struct wal_stream *xstream =
> -		container_of(stream, struct wal_stream, base);
> -	/**
> -	 * Yield once in a while, but not too often,
> -	 * mostly to allow signal handling to take place.
> +	assert(txn != NULL);
> +	if (!row->is_commit)
> +		return 0;
> +
> +	stream->tsn = 0;
> +	if (txn_commit_try_async(txn) != 0) {
> +		/* Commit fail automatically leads to rollback. */
> +		assert(in_txn() == NULL);
> +		say_error("couldn't commit a recovery transaction");
> +		return -1;
> +	}
> +	assert(in_txn() == NULL);
> +	fiber_gc();
> +	return 0;
> +
> +end_diag_request:
> +	/*
> +	 * The label must be used only for the errors related directly to the
> +	 * request. Errors like txn_begin() fail has nothing to do with it, and
> +	 * therefore don't log the request as the fault reason.
> +	 */
> +	say_error("error at request: %s", request_str(&request));
> +	return -1;
> +}
> +
> +/**
> + * Yield once in a while, but not too often, mostly to allow signal handling to
> + * take place.
> + */
> +static void
> +wal_stream_try_yield(struct wal_stream *stream)
> +{
> +	bool needs_yield = (stream->rows % WAL_ROWS_PER_YIELD == 0);
> +	if (wal_stream_has_tx(stream)) {
                                       ^ && needs_yield ?

> +		/*
> +		 * Save the yield. Otherwise it would happen only on rows which
> +		 * are a multiple of WAL_ROWS_PER_YIELD and are last in their
> +		 * transaction, which is probably a very rare coincidence.
> +		 */
> +		stream->has_yield = true;
> +		return;
> +	}
> +	if (stream->has_yield)
> +		stream->has_yield = false;
> +	else if (!needs_yield)
> +		return;
> +	fiber_sleep(0);
> +}

Consider this diff. It looks simpler IMO, but feel free to ignore.

=============================
diff --git a/src/box/box.cc b/src/box/box.cc
index 8eacbfebb..f31bc8600 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -547,20 +547,15 @@ end_diag_request:
  static void
  wal_stream_try_yield(struct wal_stream *stream)
  {
-       bool needs_yield = (stream->rows % WAL_ROWS_PER_YIELD == 0);
-       if (wal_stream_has_tx(stream)) {
-               /*
-                * Save the yield. Otherwise it would happen only on 
rows which
-                * are a multiple of WAL_ROWS_PER_YIELD and are last in 
their
-                * transaction, which is probably a very rare coincidence.
-                */
-               stream->has_yield = true;
-               return;
-       }
-       if (stream->has_yield)
-               stream->has_yield = false;
-       else if (!needs_yield)
+       /*
+        * Save the yield. Otherwise it would happen only on rows which
+        * are a multiple of WAL_ROWS_PER_YIELD and are last in their
+        * transaction, which is probably a very rare coincidence.
+        */
+       stream->has_yield |= (stream->rows % WAL_ROWS_PER_YIELD == 0);
+       if (wal_stream_has_tx(stream) || !stream->has_yield)
                 return;
+       stream->has_yield = false;
         fiber_sleep(0);
  }

=============================

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list