From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Cyrill Gorcunov <gorcunov@gmail.com>, tml <tarantool-patches@dev.tarantool.org> Cc: Mons Anderson <v.perepelitsa@corp.mail.ru> Subject: Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed Date: Mon, 22 Mar 2021 22:40:34 +0100 [thread overview] Message-ID: <fb5ba0b9-d89a-b14f-697f-955eafe7d436@tarantool.org> (raw) In-Reply-To: <20210320131521.1249747-2-gorcunov@gmail.com> Hi! Thanks for working on this! See 8 comments below. > Current state of `*.xlog` garbage collector can be found in > `box.info.gc()` output. For example > > ``` Lua > tarantool> box.info.gc() > --- > ... > cleanup_is_paused: false > ``` 1. Isn't it 'is_paused' instead of 'cleanup_is_paused'? > The `cleanup_is_paused` shows if cleanup fiber is paused or not. > --- > diff --git a/src/box/box.cc b/src/box/box.cc > index cc59564e1..b9fd7af00 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -771,6 +771,29 @@ box_check_wal_queue_max_size(void) > return size; > } > > +static double > +box_check_wal_cleanup_delay(void) > +{ > + double value = cfg_geti("wal_cleanup_delay"); 2. cfg_geti() returns an integer. Please, use cfg_getd(). > + if (value < 0) { > + diag_set(ClientError, ER_CFG, "wal_cleanup_delay", > + "the value must be >= 0"); > + return -1; > + } > + > + /* > + * Anonymous replicas do not require > + * delay the cleanup procedure since they > + * are read only. > + */ > + if (cfg_geti("replication_anon") != 0) { > + if (value != 0) > + value = 0; 3. Still it makes sense to check that if the option is set, it is valid. Regardless of what is the anon value. I also think the function should return the delay exactly as it is set, not corrected by the anon value. Because these box_check functions are not for configuring. They are for checking a single option. Also you should use 'replication_anon' global variable instead of the config, which might be not installed at this moment yet. > + } > + > + return value; > +} > @@ -1465,6 +1490,16 @@ box_set_wal_queue_max_size(void) > return 0; > } > > +int > +box_set_wal_cleanup_delay(void) > +{ > + double delay = box_check_wal_cleanup_delay(); > + if (delay < 0) > + return -1; 4. Here you could do the correction. Look at the option value and at 'replication_anon' global variable. Not at the anon config. > + gc_set_wal_cleanup_delay(delay); > + return 0; > +} > + > void > box_set_vinyl_memory(void) > { > @@ -3000,7 +3035,7 @@ box_cfg_xc(void) > rmean_box = rmean_new(iproto_type_strs, IPROTO_TYPE_STAT_MAX); > rmean_error = rmean_new(rmean_error_strings, RMEAN_ERROR_LAST); > > - gc_init(); > + gc_init(box_check_wal_cleanup_delay()); 5. Here is the problem: you rely on replication_anon setting, but it is installed below, after GC is initialized. Perhaps you could set the delay to infinite and after the non-dynamic options are done, box_set_wal_cleanup_delay() would be called by load_cfg.lua and would lower the delay to 0 or whatever is configured. > engine_init(); > schema_init(); > replication_init(); > diff --git a/src/box/gc.c b/src/box/gc.c > index 9af4ef958..5418fd31d 100644 > --- a/src/box/gc.c > +++ b/src/box/gc.c > @@ -46,6 +46,7 @@ > #include <small/rlist.h> > #include <tarantool_ev.h> > > +#include "cfg.h" 6. Not necessary anymore. > #include "diag.h" > #include "errcode.h" > #include "fiber.h" > @@ -238,6 +246,47 @@ static int > gc_cleanup_fiber_f(va_list ap) > { > (void)ap; > + > + /* > + * Stage 1 (optional): in case if we're booting > + * up with cleanup disabled lets do wait in a > + * separate cycle to minimize branching on stage 2. > + */ > + if (gc.is_paused) { > + ev_tstamp timeout = gc.wal_cleanup_delay; 7. Please, use double for timestamps and timeouts. Because it is used in almost all the other code working with time. > + while (!fiber_is_cancelled()) { > + ev_tstamp clock_start = fiber_clock(); > + if (fiber_yield_timeout(timeout)) { > + say_info("wal/engine cleanup is resumed " > + "due to timeout expiration"); > + gc.is_paused = false; > + gc.delay_ref = 0; > + break; > + } > + > + /* > + * If a last reference is dropped > + * we can exit out early. > + */ > + if (!gc.is_paused) > + break; > + > + ev_tstamp elapsed = fiber_clock() - clock_start; > + timeout = gc.wal_cleanup_delay - elapsed; 8. In the previous review I said about remembering a timestamp not accidentally. But because otherwise I can make this loop roll forever if I update wal_cleanup_delay with some period < timeout. For instance, it was 5. You remember clock_start. Then 2.5 secs pass and I update it to 4.99999. elapsed is 2.5 secs. Timeout is 4.99999 - 2.5 = ~2.5. It starts again. I update the timeout to 4.99998 after another 2.5 secs. Elapsed is 2.5 secs again, and the next timeout is about 2.5 secs. And so on. In your patch it will happen even if I increase the timeout, not only decrease. The fluctuations might be accidental in case I calculate it from something in Lua, get precision loss, and call box.cfg{} with a certain period until the instance accepts requests. This might be fixed if clock_start is moved out of the loop. > + if (timeout <= 0) { > + say_info("wal/engine cleanup is resumed " > + "due to reconfigured timeout " > + "expiration"); > + gc.is_paused = false; > + gc.delay_ref = 0; > + break; > + } > + } > + }
next prev parent reply other threads:[~2021-03-22 21:40 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-03-20 13:15 [Tarantool-patches] [PATCH v2 0/3] " Cyrill Gorcunov via Tarantool-patches 2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 1/3] " Cyrill Gorcunov via Tarantool-patches 2021-03-22 8:07 ` Konstantin Osipov via Tarantool-patches 2021-03-22 8:30 ` Cyrill Gorcunov via Tarantool-patches 2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches [this message] 2021-03-23 7:28 ` Cyrill Gorcunov via Tarantool-patches 2021-03-23 11:25 ` Cyrill Gorcunov via Tarantool-patches 2021-03-23 12:43 ` Cyrill Gorcunov via Tarantool-patches 2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 2/3] test: add a test for wal_cleanup_delay option Cyrill Gorcunov via Tarantool-patches 2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 3/3] test: box-tap/gc -- add test for is_paused field Cyrill Gorcunov via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=fb5ba0b9-d89a-b14f-697f-955eafe7d436@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=v.perepelitsa@corp.mail.ru \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox