[Tarantool-patches] [PATCH 2/3] recovery: make it transactional
Serge Petrenko
sergepetrenko at tarantool.org
Fri Apr 2 14:47:23 MSK 2021
02.04.2021 01:23, Vladislav Shpilevoy пишет:
> Recovery used to be performed row by row. It was fine because
> anyway all the persisted rows are supposed to be committed, and
> should not meet any problems during recovery so a transaction
> could be applied partially.
>
> But it became not true after the synchronous replication
> introduction. Synchronous transactions might be in the log, but
> can be followed by a ROLLBACK record which is supposed to delete
> them.
>
> During row-by-row recovery, firstly, the synchro rows each turned
> into a sync transaction. Which is probably fine. But the rows on
> non-sync spaces which were a part of a sync transaction, could be
> applied right away bypassing the limbo leading to all kind of the
> sweet errors like duplicate keys, or inconsistency of a partially
> applied transaction.
>
> The patch makes the recovery transactional. Either an entire
> transaction is recovered, or it is rolled back which normally
> happens only for synchro transactions followed by ROLLBACK.
>
> In force recovery of a broken log the consistency is not
> guaranteed though.
>
> Closes #5874
Thanks for the patch!
I ran the tests locally and replication suite fails occasionally with
```
[009] 2021-04-02 14:28:32.925 [30257] main/103/master box.cc:539 E>
error at request: {type: 'INSERT', replica_id: 0, lsn: 7, space_id: 517,
index_id: 0, tuple: [21]}
[009] 2021-04-02 14:28:32.925 [30257] main/103/master box.cc:481 E>
XlogError: found a first row in a transaction with LSN/TSN mismatch
[009] 2021-04-02 14:28:32.925 [30257] main/103/master F> can't
initialize storage: found a first row in a transaction with LSN/TSN mismatch
[009] 2021-04-02 14:28:32.925 [30257] main/103/master F> can't
initialize storage: found a first row in a transaction with LSN/TSN mismatch
```
> ---
> .../unreleased/qsync-multi-statement-recovery | 5 +
> src/box/box.cc | 261 ++++++++++++++++--
> .../gh-5874-qsync-txn-recovery.result | 124 +++++++++
> .../gh-5874-qsync-txn-recovery.test.lua | 64 +++++
> 4 files changed, 427 insertions(+), 27 deletions(-)
> create mode 100644 changelogs/unreleased/qsync-multi-statement-recovery
> create mode 100644 test/replication/gh-5874-qsync-txn-recovery.result
> create mode 100644 test/replication/gh-5874-qsync-txn-recovery.test.lua
>
...
> +/**
> + * Rows of the same transaction are wrapped into begin/commit. Mostly for the
> + * sake of synchronous replication, when the log can contain rolled back
> + * transactions, which must be entirely reverted during recovery when ROLLBACK
> + * records are met. Row-by-row recovery wouldn't work for multi-statement
> + * synchronous transactions.
> + */
> +static int
> +wal_stream_apply_dml_row(struct wal_stream *stream, struct xrow_header *row)
> {
> struct request request;
> - if (iproto_type_is_synchro_request(row->type)) {
> - struct synchro_request syn_req;
> - if (xrow_decode_synchro(row, &syn_req) != 0)
> - diag_raise();
> - txn_limbo_process(&txn_limbo, &syn_req);
> - return;
> + uint64_t req_type = dml_request_key_map(row->type);
> + if (xrow_decode_dml(row, &request, req_type) != 0) {
> + say_error("couldn't decode a DML request");
> + return -1;
> }
> - if (iproto_type_is_raft_request(row->type)) {
> - struct raft_request raft_req;
> - /* Vclock is never persisted in WAL by Raft. */
> - if (xrow_decode_raft(row, &raft_req, NULL) != 0)
> - diag_raise();
> - box_raft_recover(&raft_req);
> - return;
> + /*
> + * Note that all the information which came from the log is validated
> + * and the errors are handled. Not asserted or paniced. That is for the
> + * sake of force recovery, which must be able to recover just everything
> + * what possible instead of terminating the instance.
> + */
> + struct txn *txn;
> + if (stream->tsn == 0) {
> + if (row->tsn == 0) {
> + diag_set(XlogError, "found a row without TSN");
> + goto end_diag_request;
> + }
> + if (row->tsn != row->lsn) {
> + diag_set(XlogError, "found a first row in a "
> + "transaction with LSN/TSN mismatch");
> + goto end_diag_request;
> + }
> + stream->tsn = row->tsn;
> + /*
> + * Rows are not stacked into a list like during replication,
> + * because recovery does not yield while reading the rows. All
> + * the yields are controlled by the stream, and therefore no
> + * need to wait for all the rows to start a transaction. Can
> + * start now, apply the rows, and make a yield after commit if
> + * necessary. Helps to avoid a lot of copying.
> + */
Nice solution!
> + txn = txn_begin();
> + if (txn == NULL) {
> + say_error("couldn't begin a recovery transaction");
> + return -1;
> + }
> + } else if (row->tsn != stream->tsn) {
> + diag_set(XlogError, "found a next transaction with the "
> + "previous one not yet committed");
> + goto end_diag_request;
> + } else {
> + txn = in_txn();
> }
> - xrow_decode_dml_xc(row, &request, dml_request_key_map(row->type));
> + assert(wal_stream_has_tx(stream));
> + /* Nops might appear at least after before_replace skipping rows. */
> if (request.type != IPROTO_NOP) {
> - struct space *space = space_cache_find_xc(request.space_id);
> + struct space *space = space_cache_find(request.space_id);
> + if (space == NULL) {
> + say_error("couldn't find space by ID");
> + goto end_diag_request;
> + }
> if (box_process_rw(&request, space, NULL) != 0) {
> - say_error("error applying row: %s", request_str(&request));
> - diag_raise();
> + say_error("couldn't apply the request");
> + goto end_diag_request;
> }
> }
> - struct wal_stream *xstream =
> - container_of(stream, struct wal_stream, base);
> - /**
> - * Yield once in a while, but not too often,
> - * mostly to allow signal handling to take place.
> + assert(txn != NULL);
> + if (!row->is_commit)
> + return 0;
> +
> + stream->tsn = 0;
> + if (txn_commit_try_async(txn) != 0) {
> + /* Commit fail automatically leads to rollback. */
> + assert(in_txn() == NULL);
> + say_error("couldn't commit a recovery transaction");
> + return -1;
> + }
> + assert(in_txn() == NULL);
> + fiber_gc();
> + return 0;
> +
> +end_diag_request:
> + /*
> + * The label must be used only for the errors related directly to the
> + * request. Errors like txn_begin() fail has nothing to do with it, and
> + * therefore don't log the request as the fault reason.
> + */
> + say_error("error at request: %s", request_str(&request));
> + return -1;
> +}
> +
> +/**
> + * Yield once in a while, but not too often, mostly to allow signal handling to
> + * take place.
> + */
> +static void
> +wal_stream_try_yield(struct wal_stream *stream)
> +{
> + bool needs_yield = (stream->rows % WAL_ROWS_PER_YIELD == 0);
> + if (wal_stream_has_tx(stream)) {
^ && needs_yield ?
> + /*
> + * Save the yield. Otherwise it would happen only on rows which
> + * are a multiple of WAL_ROWS_PER_YIELD and are last in their
> + * transaction, which is probably a very rare coincidence.
> + */
> + stream->has_yield = true;
> + return;
> + }
> + if (stream->has_yield)
> + stream->has_yield = false;
> + else if (!needs_yield)
> + return;
> + fiber_sleep(0);
> +}
Consider this diff. It looks simpler IMO, but feel free to ignore.
=============================
diff --git a/src/box/box.cc b/src/box/box.cc
index 8eacbfebb..f31bc8600 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -547,20 +547,15 @@ end_diag_request:
static void
wal_stream_try_yield(struct wal_stream *stream)
{
- bool needs_yield = (stream->rows % WAL_ROWS_PER_YIELD == 0);
- if (wal_stream_has_tx(stream)) {
- /*
- * Save the yield. Otherwise it would happen only on
rows which
- * are a multiple of WAL_ROWS_PER_YIELD and are last in
their
- * transaction, which is probably a very rare coincidence.
- */
- stream->has_yield = true;
- return;
- }
- if (stream->has_yield)
- stream->has_yield = false;
- else if (!needs_yield)
+ /*
+ * Save the yield. Otherwise it would happen only on rows which
+ * are a multiple of WAL_ROWS_PER_YIELD and are last in their
+ * transaction, which is probably a very rare coincidence.
+ */
+ stream->has_yield |= (stream->rows % WAL_ROWS_PER_YIELD == 0);
+ if (wal_stream_has_tx(stream) || !stream->has_yield)
return;
+ stream->has_yield = false;
fiber_sleep(0);
}
=============================
--
Serge Petrenko
More information about the Tarantool-patches
mailing list