From: Cyrill Gorcunov via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: tml <tarantool-patches@dev.tarantool.org>, Mons Anderson <v.perepelitsa@corp.mail.ru> Subject: Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed Date: Tue, 23 Mar 2021 10:28:46 +0300 [thread overview] Message-ID: <YFmYrq3XTTISNhua@grain> (raw) In-Reply-To: <fb5ba0b9-d89a-b14f-697f-955eafe7d436@tarantool.org> On Mon, Mar 22, 2021 at 10:40:34PM +0100, Vladislav Shpilevoy wrote: > Hi! Thanks for working on this! > > See 8 comments below. > > > Current state of `*.xlog` garbage collector can be found in > > `box.info.gc()` output. For example > > > > ``` Lua > > tarantool> box.info.gc() > > --- > > ... > > cleanup_is_paused: false > > ``` > > 1. Isn't it 'is_paused' instead of 'cleanup_is_paused'? Yeah, sorry. Thanks! > > > > +static double > > +box_check_wal_cleanup_delay(void) > > +{ > > + double value = cfg_geti("wal_cleanup_delay"); > > 2. cfg_geti() returns an integer. Please, use cfg_getd(). I did it on a purpose: we accepts time in whole seconds, not milliseconds and etc. But sure, no problem will change it to cfg_getd. > > + > > + /* > > + * Anonymous replicas do not require > > + * delay the cleanup procedure since they > > + * are read only. > > + */ > > + if (cfg_geti("replication_anon") != 0) { > > + if (value != 0) > > + value = 0; > > 3. Still it makes sense to check that if the option is set, > it is valid. Regardless of what is the anon value. OK > > I also think the function should return the delay exactly as > it is set, not corrected by the anon value. Because these > box_check functions are not for configuring. They are for > checking a single option. OK > > Also you should use 'replication_anon' global variable instead > of the config, which might be not installed at this moment yet. What would happen if one setup both 'wal_cleanup_delay' and 'replication_anon' in config at once. Which C's replication_anon value I will be reading? The C's replication_anon is set after the cfg procedure complete, so since I operate on values obrained from Lua level I need to use cfg_geti("replication_anon") because at this moment only Lua level is consistent and replication_anon may have a value from previous box.cfg call. > > > > +int > > +box_set_wal_cleanup_delay(void) > > +{ > > + double delay = box_check_wal_cleanup_delay(); > > + if (delay < 0) > > + return -1; > > 4. Here you could do the correction. Look at the option > value and at 'replication_anon' global variable. Not at > the anon config. Need to think about this moment, thanks! > > > + gc_set_wal_cleanup_delay(delay); > > + return 0; > > +} > > + > > void > > box_set_vinyl_memory(void) > > { > > @@ -3000,7 +3035,7 @@ box_cfg_xc(void) > > rmean_box = rmean_new(iproto_type_strs, IPROTO_TYPE_STAT_MAX); > > rmean_error = rmean_new(rmean_error_strings, RMEAN_ERROR_LAST); > > > > - gc_init(); > > + gc_init(box_check_wal_cleanup_delay()); > > 5. Here is the problem: you rely on replication_anon setting, but it is installed > below, after GC is initialized. Exactly, that's why I use raw cfg_geti("replication_anon") instead of global replication_anon variable. > Perhaps you could set the delay to infinite and after the non-dynamic options > are done, box_set_wal_cleanup_delay() would be called by load_cfg.lua and > would lower the delay to 0 or whatever is configured. > > > engine_init(); > > schema_init(); > > replication_init(); > > diff --git a/src/box/gc.c b/src/box/gc.c > > index 9af4ef958..5418fd31d 100644 > > --- a/src/box/gc.c > > +++ b/src/box/gc.c > > @@ -46,6 +46,7 @@ > > #include <small/rlist.h> > > #include <tarantool_ev.h> > > > > +#include "cfg.h" > > 6. Not necessary anymore. +1 > > gc_cleanup_fiber_f(va_list ap) > > { > > (void)ap; > > + > > + /* > > + * Stage 1 (optional): in case if we're booting > > + * up with cleanup disabled lets do wait in a > > + * separate cycle to minimize branching on stage 2. > > + */ > > + if (gc.is_paused) { > > + ev_tstamp timeout = gc.wal_cleanup_delay; > > 7. Please, use double for timestamps and timeouts. Because it > is used in almost all the other code working with time. Sure > > + while (!fiber_is_cancelled()) { > > + ev_tstamp clock_start = fiber_clock(); > > + if (fiber_yield_timeout(timeout)) { > > + say_info("wal/engine cleanup is resumed " > > + "due to timeout expiration"); > > + gc.is_paused = false; > > + gc.delay_ref = 0; > > + break; > > + } > > + > > + /* > > + * If a last reference is dropped > > + * we can exit out early. > > + */ > > + if (!gc.is_paused) > > + break; > > + > > + ev_tstamp elapsed = fiber_clock() - clock_start; > > + timeout = gc.wal_cleanup_delay - elapsed; > > 8. In the previous review I said about remembering a timestamp not > accidentally. But because otherwise I can make this loop roll > forever if I update wal_cleanup_delay with some period < timeout. > > For instance, it was 5. You remember clock_start. Then 2.5 secs > pass and I update it to 4.99999. elapsed is 2.5 secs. Timeout is > 4.99999 - 2.5 = ~2.5. It starts again. I update the timeout to > 4.99998 after another 2.5 secs. Elapsed is 2.5 secs again, and the > next timeout is about 2.5 secs. And so on. > > In your patch it will happen even if I increase the timeout, not > only decrease. The fluctuations might be accidental in case I > calculate it from something in Lua, get precision loss, and call > box.cfg{} with a certain period until the instance accepts > requests. > > This might be fixed if clock_start is moved out of the loop. What you're talking about is fp rouding errors and you're to work really hard to enter into a such loop. But I see your point and will update. Btw the use of FP value for seconds is a big mistake in architecure, not only here but all over the code and I think we should get rid of this very strange approach (and I must confess I don't understand *why* on earth the FP is used for delays in ev library). Cyrill
next prev parent reply other threads:[~2021-03-23 7:28 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-03-20 13:15 [Tarantool-patches] [PATCH v2 0/3] " Cyrill Gorcunov via Tarantool-patches 2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 1/3] " Cyrill Gorcunov via Tarantool-patches 2021-03-22 8:07 ` Konstantin Osipov via Tarantool-patches 2021-03-22 8:30 ` Cyrill Gorcunov via Tarantool-patches 2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-23 7:28 ` Cyrill Gorcunov via Tarantool-patches [this message] 2021-03-23 11:25 ` Cyrill Gorcunov via Tarantool-patches 2021-03-23 12:43 ` Cyrill Gorcunov via Tarantool-patches 2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 2/3] test: add a test for wal_cleanup_delay option Cyrill Gorcunov via Tarantool-patches 2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 3/3] test: box-tap/gc -- add test for is_paused field Cyrill Gorcunov via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YFmYrq3XTTISNhua@grain \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=v.perepelitsa@corp.mail.ru \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox