From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 0A7AB6F87A; Wed, 28 Apr 2021 18:34:07 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 0A7AB6F87A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1619624047; bh=/Rz/i2nnvF6ENZNdNVs0Iuy/Bg/EFBV0CjFLj26+2OI=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=wKHmbILTvHANHtVApXD6eWpLsRD7Alro/ELCnU/PhjFhWthMlpqRzCTCJL+GLAkiU 2d3Kgx2gI2bKjU9aTaMXqhn4joNzdTdO3iRTUPpZ6C7cSDRmWIjUsKSwV9FsVsJ/yX q6vWSy25WsVKJyVOXnDrZ0lcWanYGbaBbfs7k+0Q= Received: from smtp37.i.mail.ru (smtp37.i.mail.ru [94.100.177.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 6E3CF6F87A for ; Wed, 28 Apr 2021 18:34:06 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 6E3CF6F87A Received: by smtp37.i.mail.ru with esmtpa (envelope-from ) id 1lbmCT-0003HC-OA; Wed, 28 Apr 2021 18:34:06 +0300 To: Vladislav Shpilevoy , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org References: <20210426165954.46474-1-sergepetrenko@tarantool.org> Message-ID: Date: Wed, 28 Apr 2021 18:34:05 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD9ECFD8CE5F059401062EF72DCC8B8CDABD8D4F98D4AE0C03D182A05F53808504047F8608BD391F0A7EB76432B31F555774514FD7B2FDC9A55D8E71D912D78D6FB X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE71D4C43A3529EEBE6EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006374F960C921106F05B8638F802B75D45FF914D58D5BE9E6BC1A93B80C6DEB9DEE97C6FB206A91F05B2741C60DC16C7BBCD88FB2EAF63C34B17267281413F282BCCD2E47CDBA5A96583C09775C1D3CA48CFBD39A56654533F91117882F4460429724CE54428C33FAD30A8DF7F3B2552694AC26CFBAC0749D213D2E47CDBA5A9658378DA827A17800CE73B0385442E67878B9FA2833FD35BB23DF004C90652538430302FCEF25BFAB3454AD6D5ED66289B5278DA827A17800CE7BB78EDD9C0D8E15ED32BA5DBAC0009BE395957E7521B51C20BC6067A898B09E4090A508E0FED6299176DF2183F8FC7C082E5F7AA0E61649ECD04E86FAF290E2D7E9C4E3C761E06A71DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B6753C3A5E0A5AB5B7089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A577D56199C8D8020DC69CFE39ACA676E8AA0C475F9BC13B6FD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7502E6951B79FF9A3F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34351729636A02D1DE35EB8013C774710AE35098760F19D81802027F28B7B76635E84F2BBFB1BF12191D7E09C32AA3244CA235F1A40B7E3BB24B5574D7F15CAB7FA90944CA99CF22E3FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojoCaqxM2e5srWLRL7ozmbnA== X-Mailru-Sender: 583F1D7ACE8F49BDD2846D59FC20E9F8BCFE7CE6399360A723437BA6EDD512134EA50647B01C1496424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH] recovery: make it yield when positioning in a WAL X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 27.04.2021 00:20, Vladislav Shpilevoy пишет: > Hi! Thanks for the patch! > > See 2 questions, 1 comment. > > On 26.04.2021 18:59, Serge Petrenko wrote: Thanks for the review! >> We had various places in box.cc and relay.cc which counted processed >> rows and yielded every now and then. These yields didn't cover cases, >> when recovery has to position inside a long WAL file: >> >> For example, when tarantool exits without leaving an empty WAL file >> which'll be used to recover instance vclock on restart. In this case >> the instance freezes while processing the last availabe WAL in order >> to recover the vclock. >> >> Another issue is with replication. If a replica connects and needs data >> from the end of a really long WAL, recovery will read up to the needed >> position without yields, making relay disconnect by timeout. >> >> In order to fix the issue, make recovery decide when a yield should >> happen. Introduce a new callback: schedule_yield, which is called by >> recovery once it processes (no matter how, either simply skips or calls >> xstream_write) enough rows (WAL_ROWS_PER_YIELD). >> >> schedule_yield either yields right away, in case of relay, or saves the >> yield for later, in case of local recovery, because it might be in the >> middle of a transaction. > 1. Did you consider an option to yield explicitly in recovery code when > it skips rows? If they are being skipped, it does not matter what are > their transaction borders. I did consider that. It is possible to do so, but then we'll have yet another place (in addition to relay and wal_stream) which counts rows and yields every now and then. I thought it would be better to unify all these places. Actually, this could be done this way from the very beginning. I think it's not recovery's business whether to yield or not once some rows are processed. Anyway, I can make it this way, if you insist. > > Then the whole patch would be to add the yield once per WAL_ROWS_PER_YIELD > to recovery_scan(), correct? True. One place in recovery_scan() and one place in recover_xlog(), when the rows are skipped. > >> The only place with explicit row counting and manual yielding is now in >> relay_initial_join, since its row sources are engines rather than recovery >> with its WAL files. >> >> Closes #5979 >> --- >> https://github.com/tarantool/tarantool/tree/sp/gh-5979-recovery-yield >> https://github.com/tarantool/tarantool/issues/5979 >> >> diff --git a/src/box/box.cc b/src/box/box.cc >> index 59925962d..69a8f87eb 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -3101,6 +3087,19 @@ bootstrap(const struct tt_uuid *instance_uuid, >> } >> } >> >> +struct wal_stream wal_stream; > 2. This must be static. Sure. =================================================== diff --git a/src/box/box.cc b/src/box/box.cc index 69a8f87eb..62b55352e 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -3087,7 +3087,7 @@ bootstrap(const struct tt_uuid *instance_uuid,         }  } -struct wal_stream wal_stream; +static struct wal_stream wal_stream;  /**   * Plan a yield in recovery stream. Wal stream will execute it as soon as it's =================================================== > >> + >> +/** >> + * Plan a yield in recovery stream. Wal stream will execute it as soon as it's >> + * ready. >> + */ >> +static void >> +wal_stream_schedule_yield(void) >> +{ >> + wal_stream.has_yield = true; >> + wal_stream_try_yield(&wal_stream); >> +} >> diff --git a/src/box/recovery.cc b/src/box/recovery.cc >> index cd33e7635..5351d8524 100644 >> --- a/src/box/recovery.cc >> +++ b/src/box/recovery.cc >> @@ -241,10 +248,16 @@ static void >> recover_xlog(struct recovery *r, struct xstream *stream, >> const struct vclock *stop_vclock) >> { >> + /* Imitate old behaviour. Rows are counted separately for each xlog. */ >> + r->row_count = 0; > 3. But why do you need to imitate it? Does it mean if the files are > too small to yield even once in each, but in total their number is > huge, there won't be yields? Yes, that's true. > > Also does it mean "1M rows processed" was not ever printed in that > case? Yes, when WALs are not big enough. Recovery starts over with '0.1M rows processed' on every new WAL file. > >> struct xrow_header row; >> - uint64_t row_count = 0; >> while (xlog_cursor_next_xc(&r->cursor, &row, >> r->wal_dir.force_recovery) == 0) { >> + if (++r->row_count % WAL_ROWS_PER_YIELD == 0) { >> + r->schedule_yield(); >> + } >> + if (r->row_count % 100000 == 0) >> + say_info("%.1fM rows processed", r->row_count / 1000000.); >> /* >> * Read the next row from xlog file. >> * >> @@ -273,12 +286,7 @@ recover_xlog(struct recovery *r, struct xstream *stream, >> * failed row anyway. >> */ >> vclock_follow_xrow(&r->vclock, &row); >> - if (xstream_write(stream, &row) == 0) { >> - ++row_count; >> - if (row_count % 100000 == 0) >> - say_info("%.1fM rows processed", >> - row_count / 1000000.); >> - } else { >> + if (xstream_write(stream, &row) != 0) { >> if (!r->wal_dir.force_recovery) >> diag_raise(); >> -- Serge Petrenko