Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>,
	tml <tarantool-patches@dev.tarantool.org>
Cc: Mons Anderson <v.perepelitsa@corp.mail.ru>
Subject: Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed
Date: Mon, 22 Mar 2021 22:40:34 +0100	[thread overview]
Message-ID: <fb5ba0b9-d89a-b14f-697f-955eafe7d436@tarantool.org> (raw)
In-Reply-To: <20210320131521.1249747-2-gorcunov@gmail.com>

Hi! Thanks for working on this!

See 8 comments below.

> Current state of `*.xlog` garbage collector can be found in
> `box.info.gc()` output. For example
> 
> ``` Lua
>  tarantool> box.info.gc()
>  ---
>    ...
>    cleanup_is_paused: false
> ```

1. Isn't it 'is_paused' instead of 'cleanup_is_paused'?

> The `cleanup_is_paused` shows if cleanup fiber is paused or not.
> ---
> diff --git a/src/box/box.cc b/src/box/box.cc
> index cc59564e1..b9fd7af00 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -771,6 +771,29 @@ box_check_wal_queue_max_size(void)
>  	return size;
>  }
>  
> +static double
> +box_check_wal_cleanup_delay(void)
> +{
> +	double value = cfg_geti("wal_cleanup_delay");

2. cfg_geti() returns an integer. Please, use cfg_getd().

> +	if (value < 0) {
> +		diag_set(ClientError, ER_CFG, "wal_cleanup_delay",
> +			 "the value must be >= 0");
> +		return -1;
> +	}
> +
> +	/*
> +	 * Anonymous replicas do not require
> +	 * delay the cleanup procedure since they
> +	 * are read only.
> +	 */
> +	if (cfg_geti("replication_anon") != 0) {
> +		if (value != 0)
> +			value = 0;

3. Still it makes sense to check that if the option is set,
it is valid. Regardless of what is the anon value.

I also think the function should return the delay exactly as
it is set, not corrected by the anon value. Because these
box_check functions are not for configuring. They are for
checking a single option.

Also you should use 'replication_anon' global variable instead
of the config, which might be not installed at this moment yet.

> +	}
> +
> +	return value;
> +}
> @@ -1465,6 +1490,16 @@ box_set_wal_queue_max_size(void)
>  	return 0;
>  }
>  
> +int
> +box_set_wal_cleanup_delay(void)
> +{
> +	double delay = box_check_wal_cleanup_delay();
> +	if (delay < 0)
> +		return -1;

4. Here you could do the correction. Look at the option
value and at 'replication_anon' global variable. Not at
the anon config.

> +	gc_set_wal_cleanup_delay(delay);
> +	return 0;
> +}
> +
>  void
>  box_set_vinyl_memory(void)
>  {
> @@ -3000,7 +3035,7 @@ box_cfg_xc(void)
>  	rmean_box = rmean_new(iproto_type_strs, IPROTO_TYPE_STAT_MAX);
>  	rmean_error = rmean_new(rmean_error_strings, RMEAN_ERROR_LAST);
>  
> -	gc_init();
> +	gc_init(box_check_wal_cleanup_delay());

5. Here is the problem: you rely on replication_anon setting, but it is installed
below, after GC is initialized.

Perhaps you could set the delay to infinite and after the non-dynamic options
are done, box_set_wal_cleanup_delay() would be called by load_cfg.lua and
would lower the delay to 0 or whatever is configured.

>  	engine_init();
>  	schema_init();
>  	replication_init();
> diff --git a/src/box/gc.c b/src/box/gc.c
> index 9af4ef958..5418fd31d 100644
> --- a/src/box/gc.c
> +++ b/src/box/gc.c
> @@ -46,6 +46,7 @@
>  #include <small/rlist.h>
>  #include <tarantool_ev.h>
>  
> +#include "cfg.h"

6. Not necessary anymore.

>  #include "diag.h"
>  #include "errcode.h"
>  #include "fiber.h"
> @@ -238,6 +246,47 @@ static int
>  gc_cleanup_fiber_f(va_list ap)
>  {
>  	(void)ap;
> +
> +	/*
> +	 * Stage 1 (optional): in case if we're booting
> +	 * up with cleanup disabled lets do wait in a
> +	 * separate cycle to minimize branching on stage 2.
> +	 */
> +	if (gc.is_paused) {
> +		ev_tstamp timeout = gc.wal_cleanup_delay;

7. Please, use double for timestamps and timeouts. Because it
is used in almost all the other code working with time.

> +		while (!fiber_is_cancelled()) {
> +			ev_tstamp clock_start = fiber_clock();
> +			if (fiber_yield_timeout(timeout)) {
> +				say_info("wal/engine cleanup is resumed "
> +					 "due to timeout expiration");
> +				gc.is_paused = false;
> +				gc.delay_ref = 0;
> +				break;
> +			}
> +
> +			/*
> +			 * If a last reference is dropped
> +			 * we can exit out early.
> +			 */
> +			if (!gc.is_paused)
> +				break;
> +
> +			ev_tstamp elapsed = fiber_clock() - clock_start;
> +			timeout = gc.wal_cleanup_delay - elapsed;

8. In the previous review I said about remembering a timestamp not
accidentally. But because otherwise I can make this loop roll
forever if I update wal_cleanup_delay with some period < timeout.

For instance, it was 5. You remember clock_start. Then 2.5 secs
pass and I update it to 4.99999. elapsed is 2.5 secs. Timeout is
4.99999 - 2.5 = ~2.5. It starts again. I update the timeout to
4.99998 after another 2.5 secs. Elapsed is 2.5 secs again, and the
next timeout is about 2.5 secs. And so on.

In your patch it will happen even if I increase the timeout, not
only decrease. The fluctuations might be accidental in case I
calculate it from something in Lua, get precision loss, and call
box.cfg{} with a certain period until the instance accepts
requests.

This might be fixed if clock_start is moved out of the loop.

> +			if (timeout <= 0) {
> +				say_info("wal/engine cleanup is resumed "
> +					 "due to reconfigured timeout "
> +					 "expiration");
> +				gc.is_paused = false;
> +				gc.delay_ref = 0;
> +				break;
> +			}
> +		}
> +	}

  parent reply	other threads:[~2021-03-22 21:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-20 13:15 [Tarantool-patches] [PATCH v2 0/3] " Cyrill Gorcunov via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 1/3] " Cyrill Gorcunov via Tarantool-patches
2021-03-22  8:07   ` Konstantin Osipov via Tarantool-patches
2021-03-22  8:30     ` Cyrill Gorcunov via Tarantool-patches
2021-03-22 21:40   ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-03-23  7:28     ` Cyrill Gorcunov via Tarantool-patches
2021-03-23 11:25       ` Cyrill Gorcunov via Tarantool-patches
2021-03-23 12:43         ` Cyrill Gorcunov via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 2/3] test: add a test for wal_cleanup_delay option Cyrill Gorcunov via Tarantool-patches
2021-03-22 21:40   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 3/3] test: box-tap/gc -- add test for is_paused field Cyrill Gorcunov via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fb5ba0b9-d89a-b14f-697f-955eafe7d436@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=v.perepelitsa@corp.mail.ru \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox