From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id CABFD6FC8F; Tue, 23 Mar 2021 00:40:38 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org CABFD6FC8F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1616449238; bh=+eAmweOkj4bO40sDDoulc9D/xfObbmPjdrUHoV5wGhs=; h=To:Cc:References:Date:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=VmElKQ0MTiWHB9j6HcDF5ak4pjdNe4vtxKw4K61fmPUmZ9xeekb3zBs21ASRXcPUg RqNXyHbfgAgl6DlFDaGvWW6VEyy/1AH5W3SDqN/sgz6PZyCbnQQhc29cBfof9sUjAo w5Ys/WCerpxXqWm7Eqzn/SPnKlNu4MjGN/cPGjfE= Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 3344F6FC8F for ; Tue, 23 Mar 2021 00:40:37 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 3344F6FC8F Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1lOSHs-0004ak-C4; Tue, 23 Mar 2021 00:40:36 +0300 To: Cyrill Gorcunov , tml Cc: Mons Anderson References: <20210320131521.1249747-1-gorcunov@gmail.com> <20210320131521.1249747-2-gorcunov@gmail.com> Message-ID: Date: Mon, 22 Mar 2021 22:40:34 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <20210320131521.1249747-2-gorcunov@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD95D6E7CC48CB1F5F179C48A9DDACBFB6F5347129BC2C9341C182A05F53808504044B9598D3325A75189080012E1D3E08877AB64F49850463FC2C11DF34C2B58BB X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7CE54A8686262D0D1EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006371777B963B59186248638F802B75D45FF914D58D5BE9E6BC131B5C99E7648C95C686AF563A045C75E16FBB108ECFDD7175F2F99F0219E3E14A471835C12D1D9774AD6D5ED66289B5278DA827A17800CE7328B01A8D746D8839FA2833FD35BB23D2EF20D2F80756B5F868A13BD56FB6657A471835C12D1D977725E5C173C3A84C37EF884183F8E4D67117882F4460429728AD0CFFFB425014E868A13BD56FB6657E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE14079BD4B6F7A4D31EC0BEC64975D915A344093EC92FD9297F6718AA50765F79006371F24DFF1B296142567F23339F89546C5A8DF7F3B2552694A6FED454B719173D6725E5C173C3A84C3CDD845A7E6742F4735872C767BF85DA2F004C906525384306FED454B719173D6462275124DF8B9C934F12F0C005D1A85E5BFE6E7EFDEDCD789D4C264860C145E X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A2BBE337FB72E923155C0AF1600DCBC20B386D740E071D760D6EAE02C79FB901D9 X-C1DE0DAB: 0D63561A33F958A5C23C3EA0408ECB02E3F6862214F24BC42BEC17DE42276CE8D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75F04B387B5D7535DE410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D346F44602F3B494634CCB4031A7D7426CB06275CA73AA2163F91E95DE7C268A5DEA8CBF776B25899F41D7E09C32AA3244CC5640B7927F7CFDB86320B14BA567FA37101BF96129E4011FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojWJZv20R+6UiJO4eDeFeVTg== X-Mailru-Sender: 689FA8AB762F73936BC43F508A063822E2F3D385246F1F670EDA784E190663413841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Hi! Thanks for working on this! See 8 comments below. > Current state of `*.xlog` garbage collector can be found in > `box.info.gc()` output. For example > > ``` Lua > tarantool> box.info.gc() > --- > ... > cleanup_is_paused: false > ``` 1. Isn't it 'is_paused' instead of 'cleanup_is_paused'? > The `cleanup_is_paused` shows if cleanup fiber is paused or not. > --- > diff --git a/src/box/box.cc b/src/box/box.cc > index cc59564e1..b9fd7af00 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -771,6 +771,29 @@ box_check_wal_queue_max_size(void) > return size; > } > > +static double > +box_check_wal_cleanup_delay(void) > +{ > + double value = cfg_geti("wal_cleanup_delay"); 2. cfg_geti() returns an integer. Please, use cfg_getd(). > + if (value < 0) { > + diag_set(ClientError, ER_CFG, "wal_cleanup_delay", > + "the value must be >= 0"); > + return -1; > + } > + > + /* > + * Anonymous replicas do not require > + * delay the cleanup procedure since they > + * are read only. > + */ > + if (cfg_geti("replication_anon") != 0) { > + if (value != 0) > + value = 0; 3. Still it makes sense to check that if the option is set, it is valid. Regardless of what is the anon value. I also think the function should return the delay exactly as it is set, not corrected by the anon value. Because these box_check functions are not for configuring. They are for checking a single option. Also you should use 'replication_anon' global variable instead of the config, which might be not installed at this moment yet. > + } > + > + return value; > +} > @@ -1465,6 +1490,16 @@ box_set_wal_queue_max_size(void) > return 0; > } > > +int > +box_set_wal_cleanup_delay(void) > +{ > + double delay = box_check_wal_cleanup_delay(); > + if (delay < 0) > + return -1; 4. Here you could do the correction. Look at the option value and at 'replication_anon' global variable. Not at the anon config. > + gc_set_wal_cleanup_delay(delay); > + return 0; > +} > + > void > box_set_vinyl_memory(void) > { > @@ -3000,7 +3035,7 @@ box_cfg_xc(void) > rmean_box = rmean_new(iproto_type_strs, IPROTO_TYPE_STAT_MAX); > rmean_error = rmean_new(rmean_error_strings, RMEAN_ERROR_LAST); > > - gc_init(); > + gc_init(box_check_wal_cleanup_delay()); 5. Here is the problem: you rely on replication_anon setting, but it is installed below, after GC is initialized. Perhaps you could set the delay to infinite and after the non-dynamic options are done, box_set_wal_cleanup_delay() would be called by load_cfg.lua and would lower the delay to 0 or whatever is configured. > engine_init(); > schema_init(); > replication_init(); > diff --git a/src/box/gc.c b/src/box/gc.c > index 9af4ef958..5418fd31d 100644 > --- a/src/box/gc.c > +++ b/src/box/gc.c > @@ -46,6 +46,7 @@ > #include > #include > > +#include "cfg.h" 6. Not necessary anymore. > #include "diag.h" > #include "errcode.h" > #include "fiber.h" > @@ -238,6 +246,47 @@ static int > gc_cleanup_fiber_f(va_list ap) > { > (void)ap; > + > + /* > + * Stage 1 (optional): in case if we're booting > + * up with cleanup disabled lets do wait in a > + * separate cycle to minimize branching on stage 2. > + */ > + if (gc.is_paused) { > + ev_tstamp timeout = gc.wal_cleanup_delay; 7. Please, use double for timestamps and timeouts. Because it is used in almost all the other code working with time. > + while (!fiber_is_cancelled()) { > + ev_tstamp clock_start = fiber_clock(); > + if (fiber_yield_timeout(timeout)) { > + say_info("wal/engine cleanup is resumed " > + "due to timeout expiration"); > + gc.is_paused = false; > + gc.delay_ref = 0; > + break; > + } > + > + /* > + * If a last reference is dropped > + * we can exit out early. > + */ > + if (!gc.is_paused) > + break; > + > + ev_tstamp elapsed = fiber_clock() - clock_start; > + timeout = gc.wal_cleanup_delay - elapsed; 8. In the previous review I said about remembering a timestamp not accidentally. But because otherwise I can make this loop roll forever if I update wal_cleanup_delay with some period < timeout. For instance, it was 5. You remember clock_start. Then 2.5 secs pass and I update it to 4.99999. elapsed is 2.5 secs. Timeout is 4.99999 - 2.5 = ~2.5. It starts again. I update the timeout to 4.99998 after another 2.5 secs. Elapsed is 2.5 secs again, and the next timeout is about 2.5 secs. And so on. In your patch it will happen even if I increase the timeout, not only decrease. The fluctuations might be accidental in case I calculate it from something in Lua, get precision loss, and call box.cfg{} with a certain period until the instance accepts requests. This might be fixed if clock_start is moved out of the loop. > + if (timeout <= 0) { > + say_info("wal/engine cleanup is resumed " > + "due to reconfigured timeout " > + "expiration"); > + gc.is_paused = false; > + gc.delay_ref = 0; > + break; > + } > + } > + }