From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id B293D6EC5D; Fri, 2 Apr 2021 14:47:26 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org B293D6EC5D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1617364046; bh=KoOn51iAnI5goQOj6RLjJrsDpteINrEZIOyCC8MfsRo=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=e3HJFABOQBZiMkhKL4JEqUp9ZQVCJIGgJEuNqakS4Io1p1xl6+BLKu9jBUSES0vgZ spvrjJdIrpAw3zebWtLTGxeVs+lnv3Jq5JodQS0olMFue+LRxOCJCZbRYOTYLoho4W KXqYvOICGJO5yQQTIYSp6fzzVusYEODas7wvGlgc= Received: from smtp49.i.mail.ru (smtp49.i.mail.ru [94.100.177.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 48A796EC5D for ; Fri, 2 Apr 2021 14:47:25 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 48A796EC5D Received: by smtp49.i.mail.ru with esmtpa (envelope-from ) id 1lSIGp-0001EM-JS; Fri, 02 Apr 2021 14:47:24 +0300 To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, gorcunov@gmail.com, korablev@tarantool.org References: Message-ID: <9c9b3077-0579-d57b-c829-242d6fdd2a3c@tarantool.org> Date: Fri, 2 Apr 2021 14:47:23 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9ED7173E37F4E32947A0146560F8BA709C5527BBE40F25C64182A05F5380850404A5EE1152CD7F82BE0C443D6BCE37048D7E980F943457866E89F5915DB5E249D X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7818460412E3A2163EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006378D8F1FE4E764DFCDEA1F7E6F0F101C67CDEEF6D7F21E0D1D174C73DBBBFC7664C0C45185B2B26EA0B64B58F8C3FAE8683CA2A1E3393D5847389733CBF5DBD5E913377AFFFEAFD269176DF2183F8FC7C0B27420F9988F54058941B15DA834481FCF19DD082D7633A0EF3E4896CB9E6436389733CBF5DBD5E9D5E8D9A59859A8B611E5B64A728E1589CC7F00164DA146DA6F5DAA56C3B73B237318B6A418E8EAB8D32BA5DBAC0009BE9E8FC8737B5C2249B5C3F9F4236781E376E601842F6C81A12EF20D2F80756B5F7E9C4E3C761E06A776E601842F6C81A127C277FBC8AE2E8BD92A3E1CB07A48283AA81AA40904B5D9DBF02ECDB25306B2201CA6A4E26CD07C3BBE47FD9DD3FB595F5C1EE8F4F765FC72CEEB2601E22B093A03B725D353964B0B7D0EA88DDEDAC722CA9DD8327EE4931B544F03EFBC4D57157381330E1A1978C4224003CC83647689D4C264860C145E X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A24A6D60772A99906F8E1CD14B953EB46DF0BE4B801D0B57C1355D89D7DBCDD132 X-C1DE0DAB: 0D63561A33F958A5D42CCB7370ED4F723C5D45BB7330CACCF37B9DAE311BC553D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7502E6951B79FF9A3F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D3419891600CCEF4607B65671EED848CC0026B03197885DA16F16A0631D1E0F267BA42944FA7E53FDC21D7E09C32AA3244C0ED2E7BB357D56727DF1AF2355ECE9BBFE8DA44ABE2443F7927AC6DF5659F194 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojiysexgWf335bYsKW22u56Q== X-Mailru-Sender: 583F1D7ACE8F49BDD2846D59FC20E9F8985E5858D2E2A998856AF0EDFA9B01D03120E6DF4C999AC9424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 2/3] recovery: make it transactional X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 02.04.2021 01:23, Vladislav Shpilevoy пишет: > Recovery used to be performed row by row. It was fine because > anyway all the persisted rows are supposed to be committed, and > should not meet any problems during recovery so a transaction > could be applied partially. > > But it became not true after the synchronous replication > introduction. Synchronous transactions might be in the log, but > can be followed by a ROLLBACK record which is supposed to delete > them. > > During row-by-row recovery, firstly, the synchro rows each turned > into a sync transaction. Which is probably fine. But the rows on > non-sync spaces which were a part of a sync transaction, could be > applied right away bypassing the limbo leading to all kind of the > sweet errors like duplicate keys, or inconsistency of a partially > applied transaction. > > The patch makes the recovery transactional. Either an entire > transaction is recovered, or it is rolled back which normally > happens only for synchro transactions followed by ROLLBACK. > > In force recovery of a broken log the consistency is not > guaranteed though. > > Closes #5874 Thanks for the patch! I ran the tests locally and replication suite fails occasionally with ``` [009] 2021-04-02 14:28:32.925 [30257] main/103/master box.cc:539 E> error at request: {type: 'INSERT', replica_id: 0, lsn: 7, space_id: 517, index_id: 0, tuple: [21]} [009] 2021-04-02 14:28:32.925 [30257] main/103/master box.cc:481 E> XlogError: found a first row in a transaction with LSN/TSN mismatch [009] 2021-04-02 14:28:32.925 [30257] main/103/master F> can't initialize storage: found a first row in a transaction with LSN/TSN mismatch [009] 2021-04-02 14:28:32.925 [30257] main/103/master F> can't initialize storage: found a first row in a transaction with LSN/TSN mismatch ``` > --- > .../unreleased/qsync-multi-statement-recovery | 5 + > src/box/box.cc | 261 ++++++++++++++++-- > .../gh-5874-qsync-txn-recovery.result | 124 +++++++++ > .../gh-5874-qsync-txn-recovery.test.lua | 64 +++++ > 4 files changed, 427 insertions(+), 27 deletions(-) > create mode 100644 changelogs/unreleased/qsync-multi-statement-recovery > create mode 100644 test/replication/gh-5874-qsync-txn-recovery.result > create mode 100644 test/replication/gh-5874-qsync-txn-recovery.test.lua > ... > +/** > + * Rows of the same transaction are wrapped into begin/commit. Mostly for the > + * sake of synchronous replication, when the log can contain rolled back > + * transactions, which must be entirely reverted during recovery when ROLLBACK > + * records are met. Row-by-row recovery wouldn't work for multi-statement > + * synchronous transactions. > + */ > +static int > +wal_stream_apply_dml_row(struct wal_stream *stream, struct xrow_header *row) > { > struct request request; > - if (iproto_type_is_synchro_request(row->type)) { > - struct synchro_request syn_req; > - if (xrow_decode_synchro(row, &syn_req) != 0) > - diag_raise(); > - txn_limbo_process(&txn_limbo, &syn_req); > - return; > + uint64_t req_type = dml_request_key_map(row->type); > + if (xrow_decode_dml(row, &request, req_type) != 0) { > + say_error("couldn't decode a DML request"); > + return -1; > } > - if (iproto_type_is_raft_request(row->type)) { > - struct raft_request raft_req; > - /* Vclock is never persisted in WAL by Raft. */ > - if (xrow_decode_raft(row, &raft_req, NULL) != 0) > - diag_raise(); > - box_raft_recover(&raft_req); > - return; > + /* > + * Note that all the information which came from the log is validated > + * and the errors are handled. Not asserted or paniced. That is for the > + * sake of force recovery, which must be able to recover just everything > + * what possible instead of terminating the instance. > + */ > + struct txn *txn; > + if (stream->tsn == 0) { > + if (row->tsn == 0) { > + diag_set(XlogError, "found a row without TSN"); > + goto end_diag_request; > + } > + if (row->tsn != row->lsn) { > + diag_set(XlogError, "found a first row in a " > + "transaction with LSN/TSN mismatch"); > + goto end_diag_request; > + } > + stream->tsn = row->tsn; > + /* > + * Rows are not stacked into a list like during replication, > + * because recovery does not yield while reading the rows. All > + * the yields are controlled by the stream, and therefore no > + * need to wait for all the rows to start a transaction. Can > + * start now, apply the rows, and make a yield after commit if > + * necessary. Helps to avoid a lot of copying. > + */ Nice solution! > + txn = txn_begin(); > + if (txn == NULL) { > + say_error("couldn't begin a recovery transaction"); > + return -1; > + } > + } else if (row->tsn != stream->tsn) { > + diag_set(XlogError, "found a next transaction with the " > + "previous one not yet committed"); > + goto end_diag_request; > + } else { > + txn = in_txn(); > } > - xrow_decode_dml_xc(row, &request, dml_request_key_map(row->type)); > + assert(wal_stream_has_tx(stream)); > + /* Nops might appear at least after before_replace skipping rows. */ > if (request.type != IPROTO_NOP) { > - struct space *space = space_cache_find_xc(request.space_id); > + struct space *space = space_cache_find(request.space_id); > + if (space == NULL) { > + say_error("couldn't find space by ID"); > + goto end_diag_request; > + } > if (box_process_rw(&request, space, NULL) != 0) { > - say_error("error applying row: %s", request_str(&request)); > - diag_raise(); > + say_error("couldn't apply the request"); > + goto end_diag_request; > } > } > - struct wal_stream *xstream = > - container_of(stream, struct wal_stream, base); > - /** > - * Yield once in a while, but not too often, > - * mostly to allow signal handling to take place. > + assert(txn != NULL); > + if (!row->is_commit) > + return 0; > + > + stream->tsn = 0; > + if (txn_commit_try_async(txn) != 0) { > + /* Commit fail automatically leads to rollback. */ > + assert(in_txn() == NULL); > + say_error("couldn't commit a recovery transaction"); > + return -1; > + } > + assert(in_txn() == NULL); > + fiber_gc(); > + return 0; > + > +end_diag_request: > + /* > + * The label must be used only for the errors related directly to the > + * request. Errors like txn_begin() fail has nothing to do with it, and > + * therefore don't log the request as the fault reason. > + */ > + say_error("error at request: %s", request_str(&request)); > + return -1; > +} > + > +/** > + * Yield once in a while, but not too often, mostly to allow signal handling to > + * take place. > + */ > +static void > +wal_stream_try_yield(struct wal_stream *stream) > +{ > + bool needs_yield = (stream->rows % WAL_ROWS_PER_YIELD == 0); > + if (wal_stream_has_tx(stream)) {                                       ^ && needs_yield ? > + /* > + * Save the yield. Otherwise it would happen only on rows which > + * are a multiple of WAL_ROWS_PER_YIELD and are last in their > + * transaction, which is probably a very rare coincidence. > + */ > + stream->has_yield = true; > + return; > + } > + if (stream->has_yield) > + stream->has_yield = false; > + else if (!needs_yield) > + return; > + fiber_sleep(0); > +} Consider this diff. It looks simpler IMO, but feel free to ignore. ============================= diff --git a/src/box/box.cc b/src/box/box.cc index 8eacbfebb..f31bc8600 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -547,20 +547,15 @@ end_diag_request:  static void  wal_stream_try_yield(struct wal_stream *stream)  { -       bool needs_yield = (stream->rows % WAL_ROWS_PER_YIELD == 0); -       if (wal_stream_has_tx(stream)) { -               /* -                * Save the yield. Otherwise it would happen only on rows which -                * are a multiple of WAL_ROWS_PER_YIELD and are last in their -                * transaction, which is probably a very rare coincidence. -                */ -               stream->has_yield = true; -               return; -       } -       if (stream->has_yield) -               stream->has_yield = false; -       else if (!needs_yield) +       /* +        * Save the yield. Otherwise it would happen only on rows which +        * are a multiple of WAL_ROWS_PER_YIELD and are last in their +        * transaction, which is probably a very rare coincidence. +        */ +       stream->has_yield |= (stream->rows % WAL_ROWS_PER_YIELD == 0); +       if (wal_stream_has_tx(stream) || !stream->has_yield)                 return; +       stream->has_yield = false;         fiber_sleep(0);  } ============================= -- Serge Petrenko