[Tarantool-patches] [PATCH 1/2] gc/xlog: delay xlog cleanup until relays are subscribed

Serge Petrenko sergepetrenko at tarantool.org
Fri Mar 19 16:50:38 MSK 2021


18.03.2021 21:41, Cyrill Gorcunov пишет:
> In case if replica managed to be far behind the master node
> (so there are a number of xlog files present after the last

...

>   src/box/replication.cc          |  2 +
>   test/app-tap/init_script.result |  1 +
>   test/box/admin.result           |  2 +
>   test/box/cfg.result             |  4 ++
>   10 files changed, 153 insertions(+), 3 deletions(-)

Hi!  Thanks for the patch!

Please find 2 minor comments below.

>
> diff --git a/src/box/box.cc b/src/box/box.cc
> index b4a1a5e07..ee017c2dc 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -754,6 +754,15 @@ box_check_wal_mode(const char *mode_name)
>   	return (enum wal_mode) mode;
>   }
>   
> +static void
> +box_check_wal_cleanup_delay(int tmo)
> +{
> +	if (tmo < 0 && tmo != -1) {
> +		tnt_raise(ClientError, ER_CFG, "wal_cleanup_delay",
> +			  "the value must be either >= 0 or -1");
> +	}
> +}
> +


1. AFAICS all other delay and timeout parameters are 'double'.
    Let's make this one also a double.

>   static void
>   box_check_readahead(int readahead)
>   {
> @@ -899,6 +908,7 @@ box_check_config(void)
>   	box_check_checkpoint_count(cfg_geti("checkpoint_count"));
>   	box_check_wal_max_size(cfg_geti64("wal_max_size"));
>   	box_check_wal_mode(cfg_gets("wal_mode"));
> +	box_check_wal_cleanup_delay(cfg_geti("wal_cleanup_delay"));
>   	if (box_check_memory_quota("memtx_memory") < 0)
>   		diag_raise();
>   	box_check_memtx_min_tuple_size(cfg_geti64("memtx_min_tuple_size"));
> @@ -3045,6 +3055,7 @@ box_cfg_xc(void)
>   		bootstrap(&instance_uuid, &replicaset_uuid,
>   			  &is_bootstrap_leader);
>   	}
> +	gc_delay_unref();
>   	fiber_gc();
>   
>   	bootstrap_journal_guard.is_active = false;

...

> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index 41f949e8e..5d09690ad 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -668,6 +668,13 @@ relay_send_is_raft_enabled(struct relay *relay,
>   	}
>   }
>   
> +static void
> +relay_gc_delay_unref(struct cmsg *msg)
> +{
> +	(void)msg;
> +	gc_delay_unref();
> +}
> +
>   /**
>    * A libev callback invoked when a relay client socket is ready
>    * for read. This currently only happens when the client closes
> @@ -721,6 +728,19 @@ relay_subscribe_f(va_list ap)
>   	 */
>   	relay_send_heartbeat(relay);
>   
> +	/*
> +	 * Now we can resume wal/engine gc as relay
> +	 * is up and running.
> +	 */
> +	if (!relay->replica->anon) {
> +		static const struct cmsg_hop gc_delay_route[] = {
> +			{relay_gc_delay_unref, NULL}
> +		};
> +		struct cmsg gc_delay_msg;
> +		cmsg_init(&gc_delay_msg, gc_delay_route);
> +		cpipe_push(&relay->tx_pipe, &gc_delay_msg);
> +	}
> +

2. I agree with Vlad here. No need to call gc_delay_unref() from the 
relay thread.
    We need to wait until a corresponding gc consumer is registered. 
This is done
    in relay_subscribe(), which is in tx thread.

    In fact, you may as well move the gc_delay_unref() call directly to 
gc_consumer_register().
    This would look even better, IMO.

>   	/*
>   	 * Run the event loop until the connection is broken
>   	 * or an error occurs.
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index 1fa8843e7..aefb812b3 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -250,6 +250,7 @@ replica_set_id(struct replica *replica, uint32_t replica_id)
>   						   tt_uuid_str(&replica->uuid));
>   	}
>   	replicaset.replica_by_id[replica_id] = replica;
> +	gc_delay_ref();
>   	++replicaset.registered_count;
>   	say_info("assigned id %d to replica %s",
>   		 replica->id, tt_uuid_str(&replica->uuid));
> @@ -273,6 +274,7 @@ replica_clear_id(struct replica *replica)
>   	replicaset.replica_by_id[replica->id] = NULL;
>   	assert(replicaset.registered_count > 0);
>   	--replicaset.registered_count;
> +	gc_delay_unref();
>   	if (replica->id == instance_id) {
>   		/* See replica_check_id(). */
>   		assert(replicaset.is_joining);
>
...

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list