[Tarantool-patches] [PATCH v4 3/4] gc: rely on minimal vclock components instead of signatures

Serge Petrenko sergepetrenko at tarantool.org
Mon Mar 30 14:02:50 MSK 2020



> 28 марта 2020 г., в 09:03, Konstantin Osipov <kostja.osipov at gmail.com> написал(а):
> 
> * Serge Petrenko <sergepetrenko at tarantool.org> [20/03/27 18:08]:
>> +	struct vclock min_vclock;
>> +	struct gc_consumer *consumer = gc_tree_first(&gc.consumers);
> 
> The code would be easier to follow if the entire vclock api used
> would be ignore0:
> 
>> +	/*
>> +	 * Vclock of the oldest WAL row to keep is a by-component
>> +	 * minimum of all consumer vclocks and the oldest
>> +	 * checkpoint vclock. This ensures that all rows needed by
>> +	 * at least one consumer are kept.
>> +	 */
>> +	vclock_copy(&min_vclock, &checkpoint->vclock);
> 
> E.g. use vclock_copy_ignore0 here
> 
>> +	while (consumer != NULL) {
>> +		/*
>> +		 * Consumers will never need rows signed
>> +		 * with a zero instance id (local rows).
>> +		 */
>> +		vclock_min_ignore0(&min_vclock, &consumer->vclock);
>> +		consumer = gc_tree_next(&gc.consumers, consumer);
>> +	}
>> +
>> +	if (vclock_sum(&min_vclock) > vclock_sum(&gc.vclock)) {
> 
> Please use vclock_sum_ignore0
>> +		vclock_copy(&gc.vclock, &min_vclock);
> 
> Please use vclock_copy_ignore0

I can’t. wal_collect_garbage() searches for the wal file with vclock
strictly less than or equal to the one provided, so 0-th clock
component cannot be zeroed out, otherwise no logs will ever be deleted.
While all the other components are taken as a minimum between all
consumers and the oldest snapshot, the 0-th component is copied directly
from the oldest snapshot, since we have to keep all WALs starting from the
oldest snapshot. It is stated in the comment a few lines above. Now i moved
it to a more relevant place.

So, gc.vclock must contain a valid 0th component. That’s why I use ‘local’
vclock_sum() and vclock_copy() here.

> 
> The goal is to switch as many places as possible to ignore0 api,
> then see the few cases left where we don't, and flip around: 
> rename ignore0 api to the default naming scheme, and no-ignore0 to
> vlock_copy_local(), vclock_compare_local() vclock_copy_local().
> 
> This doesn't have to be part of your patch set, but 
> would be nice to get to.

Sounds good, I don’t want to do it now though. Let it be part of a
follow up patch.

> 
> Ideally ignore0 vclock should be a distinct data type
> with an explicit conversion to and from non-ignore0 vclock
> and no implicit assignment (using vclock_copy already more
> or less ensures that).
> 
> Other than that you seem to be on track with the patch.
> 
> -- 
> Konstantin Osipov, Moscow, Russia

=============================================
diff --git a/src/box/gc.c b/src/box/gc.c
index 4eae6ef3b..a2d0a515c 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -186,11 +186,7 @@ gc_run_cleanup(void)
 	/* At least one checkpoint must always be available. */
 	assert(checkpoint != NULL);
 
-	/*
-	 * Find the vclock of the oldest WAL row to keep.
-	 * Note, we must keep all WALs created after the
-	 * oldest checkpoint, even if no consumer needs them.
-	 */
+	/* Find the vclock of the oldest WAL row to keep. */
 	struct vclock min_vclock;
 	struct gc_consumer *consumer = gc_tree_first(&gc.consumers);
 	/*
@@ -198,6 +194,8 @@ gc_run_cleanup(void)
 	 * minimum of all consumer vclocks and the oldest
 	 * checkpoint vclock. This ensures that all rows needed by
 	 * at least one consumer are kept.
+	 * Note, we must keep all WALs created after the
+	 * oldest checkpoint, even if no consumer needs them.
 	 */
 	vclock_copy(&min_vclock, &checkpoint->vclock);
 	while (consumer != NULL) {


--
Serge Petrenko
sergepetrenko at tarantool.org




More information about the Tarantool-patches mailing list