From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, tarantool-patches@dev.tarantool.org, gorcunov@gmail.com, korablev@tarantool.org Subject: Re: [Tarantool-patches] [PATCH 2/3] recovery: make it transactional Date: Fri, 2 Apr 2021 14:47:23 +0300 [thread overview] Message-ID: <9c9b3077-0579-d57b-c829-242d6fdd2a3c@tarantool.org> (raw) In-Reply-To: <e8c083836362fec0166d61737a6df316234a3f8a.1617315744.git.v.shpilevoy@tarantool.org> 02.04.2021 01:23, Vladislav Shpilevoy пишет: > Recovery used to be performed row by row. It was fine because > anyway all the persisted rows are supposed to be committed, and > should not meet any problems during recovery so a transaction > could be applied partially. > > But it became not true after the synchronous replication > introduction. Synchronous transactions might be in the log, but > can be followed by a ROLLBACK record which is supposed to delete > them. > > During row-by-row recovery, firstly, the synchro rows each turned > into a sync transaction. Which is probably fine. But the rows on > non-sync spaces which were a part of a sync transaction, could be > applied right away bypassing the limbo leading to all kind of the > sweet errors like duplicate keys, or inconsistency of a partially > applied transaction. > > The patch makes the recovery transactional. Either an entire > transaction is recovered, or it is rolled back which normally > happens only for synchro transactions followed by ROLLBACK. > > In force recovery of a broken log the consistency is not > guaranteed though. > > Closes #5874 Thanks for the patch! I ran the tests locally and replication suite fails occasionally with ``` [009] 2021-04-02 14:28:32.925 [30257] main/103/master box.cc:539 E> error at request: {type: 'INSERT', replica_id: 0, lsn: 7, space_id: 517, index_id: 0, tuple: [21]} [009] 2021-04-02 14:28:32.925 [30257] main/103/master box.cc:481 E> XlogError: found a first row in a transaction with LSN/TSN mismatch [009] 2021-04-02 14:28:32.925 [30257] main/103/master F> can't initialize storage: found a first row in a transaction with LSN/TSN mismatch [009] 2021-04-02 14:28:32.925 [30257] main/103/master F> can't initialize storage: found a first row in a transaction with LSN/TSN mismatch ``` > --- > .../unreleased/qsync-multi-statement-recovery | 5 + > src/box/box.cc | 261 ++++++++++++++++-- > .../gh-5874-qsync-txn-recovery.result | 124 +++++++++ > .../gh-5874-qsync-txn-recovery.test.lua | 64 +++++ > 4 files changed, 427 insertions(+), 27 deletions(-) > create mode 100644 changelogs/unreleased/qsync-multi-statement-recovery > create mode 100644 test/replication/gh-5874-qsync-txn-recovery.result > create mode 100644 test/replication/gh-5874-qsync-txn-recovery.test.lua > ... > +/** > + * Rows of the same transaction are wrapped into begin/commit. Mostly for the > + * sake of synchronous replication, when the log can contain rolled back > + * transactions, which must be entirely reverted during recovery when ROLLBACK > + * records are met. Row-by-row recovery wouldn't work for multi-statement > + * synchronous transactions. > + */ > +static int > +wal_stream_apply_dml_row(struct wal_stream *stream, struct xrow_header *row) > { > struct request request; > - if (iproto_type_is_synchro_request(row->type)) { > - struct synchro_request syn_req; > - if (xrow_decode_synchro(row, &syn_req) != 0) > - diag_raise(); > - txn_limbo_process(&txn_limbo, &syn_req); > - return; > + uint64_t req_type = dml_request_key_map(row->type); > + if (xrow_decode_dml(row, &request, req_type) != 0) { > + say_error("couldn't decode a DML request"); > + return -1; > } > - if (iproto_type_is_raft_request(row->type)) { > - struct raft_request raft_req; > - /* Vclock is never persisted in WAL by Raft. */ > - if (xrow_decode_raft(row, &raft_req, NULL) != 0) > - diag_raise(); > - box_raft_recover(&raft_req); > - return; > + /* > + * Note that all the information which came from the log is validated > + * and the errors are handled. Not asserted or paniced. That is for the > + * sake of force recovery, which must be able to recover just everything > + * what possible instead of terminating the instance. > + */ > + struct txn *txn; > + if (stream->tsn == 0) { > + if (row->tsn == 0) { > + diag_set(XlogError, "found a row without TSN"); > + goto end_diag_request; > + } > + if (row->tsn != row->lsn) { > + diag_set(XlogError, "found a first row in a " > + "transaction with LSN/TSN mismatch"); > + goto end_diag_request; > + } > + stream->tsn = row->tsn; > + /* > + * Rows are not stacked into a list like during replication, > + * because recovery does not yield while reading the rows. All > + * the yields are controlled by the stream, and therefore no > + * need to wait for all the rows to start a transaction. Can > + * start now, apply the rows, and make a yield after commit if > + * necessary. Helps to avoid a lot of copying. > + */ Nice solution! > + txn = txn_begin(); > + if (txn == NULL) { > + say_error("couldn't begin a recovery transaction"); > + return -1; > + } > + } else if (row->tsn != stream->tsn) { > + diag_set(XlogError, "found a next transaction with the " > + "previous one not yet committed"); > + goto end_diag_request; > + } else { > + txn = in_txn(); > } > - xrow_decode_dml_xc(row, &request, dml_request_key_map(row->type)); > + assert(wal_stream_has_tx(stream)); > + /* Nops might appear at least after before_replace skipping rows. */ > if (request.type != IPROTO_NOP) { > - struct space *space = space_cache_find_xc(request.space_id); > + struct space *space = space_cache_find(request.space_id); > + if (space == NULL) { > + say_error("couldn't find space by ID"); > + goto end_diag_request; > + } > if (box_process_rw(&request, space, NULL) != 0) { > - say_error("error applying row: %s", request_str(&request)); > - diag_raise(); > + say_error("couldn't apply the request"); > + goto end_diag_request; > } > } > - struct wal_stream *xstream = > - container_of(stream, struct wal_stream, base); > - /** > - * Yield once in a while, but not too often, > - * mostly to allow signal handling to take place. > + assert(txn != NULL); > + if (!row->is_commit) > + return 0; > + > + stream->tsn = 0; > + if (txn_commit_try_async(txn) != 0) { > + /* Commit fail automatically leads to rollback. */ > + assert(in_txn() == NULL); > + say_error("couldn't commit a recovery transaction"); > + return -1; > + } > + assert(in_txn() == NULL); > + fiber_gc(); > + return 0; > + > +end_diag_request: > + /* > + * The label must be used only for the errors related directly to the > + * request. Errors like txn_begin() fail has nothing to do with it, and > + * therefore don't log the request as the fault reason. > + */ > + say_error("error at request: %s", request_str(&request)); > + return -1; > +} > + > +/** > + * Yield once in a while, but not too often, mostly to allow signal handling to > + * take place. > + */ > +static void > +wal_stream_try_yield(struct wal_stream *stream) > +{ > + bool needs_yield = (stream->rows % WAL_ROWS_PER_YIELD == 0); > + if (wal_stream_has_tx(stream)) { ^ && needs_yield ? > + /* > + * Save the yield. Otherwise it would happen only on rows which > + * are a multiple of WAL_ROWS_PER_YIELD and are last in their > + * transaction, which is probably a very rare coincidence. > + */ > + stream->has_yield = true; > + return; > + } > + if (stream->has_yield) > + stream->has_yield = false; > + else if (!needs_yield) > + return; > + fiber_sleep(0); > +} Consider this diff. It looks simpler IMO, but feel free to ignore. ============================= diff --git a/src/box/box.cc b/src/box/box.cc index 8eacbfebb..f31bc8600 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -547,20 +547,15 @@ end_diag_request: static void wal_stream_try_yield(struct wal_stream *stream) { - bool needs_yield = (stream->rows % WAL_ROWS_PER_YIELD == 0); - if (wal_stream_has_tx(stream)) { - /* - * Save the yield. Otherwise it would happen only on rows which - * are a multiple of WAL_ROWS_PER_YIELD and are last in their - * transaction, which is probably a very rare coincidence. - */ - stream->has_yield = true; - return; - } - if (stream->has_yield) - stream->has_yield = false; - else if (!needs_yield) + /* + * Save the yield. Otherwise it would happen only on rows which + * are a multiple of WAL_ROWS_PER_YIELD and are last in their + * transaction, which is probably a very rare coincidence. + */ + stream->has_yield |= (stream->rows % WAL_ROWS_PER_YIELD == 0); + if (wal_stream_has_tx(stream) || !stream->has_yield) return; + stream->has_yield = false; fiber_sleep(0); } ============================= -- Serge Petrenko
next prev parent reply other threads:[~2021-04-02 11:47 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-04-01 22:23 [Tarantool-patches] [PATCH 0/3] Transactional recovery Vladislav Shpilevoy via Tarantool-patches 2021-04-01 22:23 ` [Tarantool-patches] [PATCH 1/3] vinyl: handle multi-statement recovery txns Vladislav Shpilevoy via Tarantool-patches 2021-04-02 9:24 ` Serge Petrenko via Tarantool-patches 2021-04-01 22:23 ` [Tarantool-patches] [PATCH 2/3] recovery: make it transactional Vladislav Shpilevoy via Tarantool-patches 2021-04-02 11:47 ` Serge Petrenko via Tarantool-patches [this message] 2021-04-03 13:18 ` Vladislav Shpilevoy via Tarantool-patches 2021-04-05 8:36 ` Serge Petrenko via Tarantool-patches 2021-04-02 15:11 ` Cyrill Gorcunov via Tarantool-patches 2021-04-01 22:23 ` [Tarantool-patches] [PATCH 3/3] box: remove is_local_recovery variable Vladislav Shpilevoy via Tarantool-patches 2021-04-02 11:47 ` Serge Petrenko via Tarantool-patches 2021-04-03 13:18 ` Vladislav Shpilevoy via Tarantool-patches 2021-04-05 8:17 ` Serge Petrenko via Tarantool-patches 2021-04-02 9:42 ` [Tarantool-patches] [PATCH 0/3] Transactional recovery Konstantin Osipov via Tarantool-patches 2021-04-05 16:14 ` Kirill Yukhin via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=9c9b3077-0579-d57b-c829-242d6fdd2a3c@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=korablev@tarantool.org \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 2/3] recovery: make it transactional' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox