[tarantool-patches] Re: [tarantool-patches] Re: [PATCH v3 2/2] replication: force gc to clean xdir on ENOSPC err

Konstantin Belyavskiy k.belyavskiy at tarantool.org
Fri Jun 29 00:55:42 MSK 2018


Thank you for the review.

>Четверг, 28 июня 2018, 18:25 +03:00 от Konstantin Osipov <kostja at tarantool.org>:
>
>* Konstantin Belyavskiy < k.belyavskiy at tarantool.org > [18/06/28 18:13]:
>> 
>> +void
>> +gc_xdir_clean_notify()
>> +{
>> +	/*
>> +	 * Compare the current time with the time of the last run.
>> +	 * This is needed in case of multiple failures to prevent
>> +	 * from deleting all replicas.
>> +	 */
>> +	static int prev_time = 0;
>> +	int cur_time = time(NULL);
>> +	if (cur_time - prev_time < 1)
>
>Please use fiber time. 
Fixed.
>
>
>> +	/**
>> +	 * Mark consumer with least recent vclock as "dead" and
>> +	 * invoke garbage collection. If nothing to delete find
>> +	 * next alive consumer etc. Originally created for
>> +	 * cases with running out of disk space because of
>> +	 * disconnected replica.
>> +	 */
>> +	struct gc_consumer *leftmost =
>> +	    gc_tree_first(&gc.consumers);
>> +	/*
>> +	 * Exit if no consumers left or if this consumer is
>> +	 * not associated with replica (backup for example).
>> +	 */
>
>You should delete the consumer *only* if it's going to help with
>garbage. Running out of space should not lead to deletion of all
>replicas as consumers if they are not lagging behind. 
Here I delete only the consumer with the least recent vclock
(or consumers if they share the same value) since gc_tree use
vclock signature for comparison (see gc_consumer_cmp).
>
>> +	if (leftmost == NULL || leftmost->replica == NULL)
>> +		return;
>> +	int64_t signature = leftmost->signature;
>> +	while (true) {
>> +		gc_consumer_unregister(leftmost);
>> +		leftmost = gc_tree_first(&gc.consumers);
>> +		if (leftmost == NULL || leftmost->replica == NULL ||
>> +		    leftmost->signature > signature) {
>> +			gc_run();
>> +			return;
>> +		}
>> +	}
>> +}
>> +
>
><cut>
>
>>  const char *wal_mode_STRS[] = { "none", "write", "fsync", NULL };
>> @@ -64,6 +65,8 @@ struct wal_thread {
>>  	 * priority pipe and DOES NOT support yield.
>>  	 */
>>  	struct cpipe tx_prio_pipe;
>> +	/** Return pipe from 'wal' to tx' */
>> +	struct cpipe tx_pipe;
>
>Why do you need a new pipe? 
Since operations with file (like deleting of old xlogs) requires yield and
tx_prio does not support it.
'tx' is a fiber pool, but 'tx_prio' is an endpoint and supports only callbacks
without 'yield'.
>
>> 
>> +static void
>> +gc_status_update(struct cmsg *msg)
>> +{
>> +	gc_xdir_clean_notify();
>> +	delete(msg);
>
>delete()?! 
Sorry, my fault.
>
>
>>  {
>> @@ -655,6 +665,15 @@ done:
>>  		/* Until we can pass the error to tx, log it and clear. */
>>  		error_log(error);
>>  		diag_clear(diag_get());
>> +		if (errno == ENOSPC) {
>> +			struct cmsg *msg =
>> +			    (struct cmsg*)calloc(1, sizeof(struct cmsg));
>
>Please check calloc() return value and do nothing 
>in case it's NULL. 
Done.
>
>> +			static const struct cmsg_hop route[] = {
>> +				{gc_status_update, NULL}
>> +			};
>> +			cmsg_init(msg, route);
>> +			cpipe_push(&wal_thread.tx_pipe, msg);
>> +		}
>> 
>> index 895d938d5..11f1b7fdc 100644
>> +...
>> +-- add a little timeout so gc could finish job
>> +require('fiber').sleep(0.01)
>
>Please never sleep in a test case unconditionally. It doesn't work
>reliably, especially in parallel runs.
>
>The test case should test that replicas which are severely behind 
>are abandoned, and replicas which are running well are not. 
Ok, will replace it with new test case.
>
>
>-- 
>Konstantin Osipov, Moscow, Russia,  +7 903 626 22 32
>http://tarantool.io -  www.twitter.com/kostja_osipov
>


Best regards,
Konstantin Belyavskiy
k.belyavskiy at tarantool.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20180629/6c970eae/attachment.html>


More information about the Tarantool-patches mailing list