[tarantool-patches] Re: [tarantool-patches] Re: [PATCH v3 2/2] replication: force gc to clean xdir on ENOSPC err
Konstantin Belyavskiy
k.belyavskiy at tarantool.org
Fri Jun 29 00:55:42 MSK 2018
Thank you for the review.
>Четверг, 28 июня 2018, 18:25 +03:00 от Konstantin Osipov <kostja at tarantool.org>:
>
>* Konstantin Belyavskiy < k.belyavskiy at tarantool.org > [18/06/28 18:13]:
>>
>> +void
>> +gc_xdir_clean_notify()
>> +{
>> + /*
>> + * Compare the current time with the time of the last run.
>> + * This is needed in case of multiple failures to prevent
>> + * from deleting all replicas.
>> + */
>> + static int prev_time = 0;
>> + int cur_time = time(NULL);
>> + if (cur_time - prev_time < 1)
>
>Please use fiber time.
Fixed.
>
>
>> + /**
>> + * Mark consumer with least recent vclock as "dead" and
>> + * invoke garbage collection. If nothing to delete find
>> + * next alive consumer etc. Originally created for
>> + * cases with running out of disk space because of
>> + * disconnected replica.
>> + */
>> + struct gc_consumer *leftmost =
>> + gc_tree_first(&gc.consumers);
>> + /*
>> + * Exit if no consumers left or if this consumer is
>> + * not associated with replica (backup for example).
>> + */
>
>You should delete the consumer *only* if it's going to help with
>garbage. Running out of space should not lead to deletion of all
>replicas as consumers if they are not lagging behind.
Here I delete only the consumer with the least recent vclock
(or consumers if they share the same value) since gc_tree use
vclock signature for comparison (see gc_consumer_cmp).
>
>> + if (leftmost == NULL || leftmost->replica == NULL)
>> + return;
>> + int64_t signature = leftmost->signature;
>> + while (true) {
>> + gc_consumer_unregister(leftmost);
>> + leftmost = gc_tree_first(&gc.consumers);
>> + if (leftmost == NULL || leftmost->replica == NULL ||
>> + leftmost->signature > signature) {
>> + gc_run();
>> + return;
>> + }
>> + }
>> +}
>> +
>
><cut>
>
>> const char *wal_mode_STRS[] = { "none", "write", "fsync", NULL };
>> @@ -64,6 +65,8 @@ struct wal_thread {
>> * priority pipe and DOES NOT support yield.
>> */
>> struct cpipe tx_prio_pipe;
>> + /** Return pipe from 'wal' to tx' */
>> + struct cpipe tx_pipe;
>
>Why do you need a new pipe?
Since operations with file (like deleting of old xlogs) requires yield and
tx_prio does not support it.
'tx' is a fiber pool, but 'tx_prio' is an endpoint and supports only callbacks
without 'yield'.
>
>>
>> +static void
>> +gc_status_update(struct cmsg *msg)
>> +{
>> + gc_xdir_clean_notify();
>> + delete(msg);
>
>delete()?!
Sorry, my fault.
>
>
>> {
>> @@ -655,6 +665,15 @@ done:
>> /* Until we can pass the error to tx, log it and clear. */
>> error_log(error);
>> diag_clear(diag_get());
>> + if (errno == ENOSPC) {
>> + struct cmsg *msg =
>> + (struct cmsg*)calloc(1, sizeof(struct cmsg));
>
>Please check calloc() return value and do nothing
>in case it's NULL.
Done.
>
>> + static const struct cmsg_hop route[] = {
>> + {gc_status_update, NULL}
>> + };
>> + cmsg_init(msg, route);
>> + cpipe_push(&wal_thread.tx_pipe, msg);
>> + }
>>
>> index 895d938d5..11f1b7fdc 100644
>> +...
>> +-- add a little timeout so gc could finish job
>> +require('fiber').sleep(0.01)
>
>Please never sleep in a test case unconditionally. It doesn't work
>reliably, especially in parallel runs.
>
>The test case should test that replicas which are severely behind
>are abandoned, and replicas which are running well are not.
Ok, will replace it with new test case.
>
>
>--
>Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
>http://tarantool.io - www.twitter.com/kostja_osipov
>
Best regards,
Konstantin Belyavskiy
k.belyavskiy at tarantool.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.tarantool.org/pipermail/tarantool-patches/attachments/20180629/6c970eae/attachment.html>
More information about the Tarantool-patches
mailing list