Thank you for the review. >Четверг, 28 июня 2018, 18:25 +03:00 от Konstantin Osipov : > >* Konstantin Belyavskiy < k.belyavskiy@tarantool.org > [18/06/28 18:13]: >> >> +void >> +gc_xdir_clean_notify() >> +{ >> + /* >> + * Compare the current time with the time of the last run. >> + * This is needed in case of multiple failures to prevent >> + * from deleting all replicas. >> + */ >> + static int prev_time = 0; >> + int cur_time = time(NULL); >> + if (cur_time - prev_time < 1) > >Please use fiber time. Fixed. > > >> + /** >> + * Mark consumer with least recent vclock as "dead" and >> + * invoke garbage collection. If nothing to delete find >> + * next alive consumer etc. Originally created for >> + * cases with running out of disk space because of >> + * disconnected replica. >> + */ >> + struct gc_consumer *leftmost = >> + gc_tree_first(&gc.consumers); >> + /* >> + * Exit if no consumers left or if this consumer is >> + * not associated with replica (backup for example). >> + */ > >You should delete the consumer *only* if it's going to help with >garbage. Running out of space should not lead to deletion of all >replicas as consumers if they are not lagging behind. Here I delete only the consumer with the least recent vclock (or consumers if they share the same value) since gc_tree use vclock signature for comparison (see gc_consumer_cmp). > >> + if (leftmost == NULL || leftmost->replica == NULL) >> + return; >> + int64_t signature = leftmost->signature; >> + while (true) { >> + gc_consumer_unregister(leftmost); >> + leftmost = gc_tree_first(&gc.consumers); >> + if (leftmost == NULL || leftmost->replica == NULL || >> + leftmost->signature > signature) { >> + gc_run(); >> + return; >> + } >> + } >> +} >> + > > > >> const char *wal_mode_STRS[] = { "none", "write", "fsync", NULL }; >> @@ -64,6 +65,8 @@ struct wal_thread { >> * priority pipe and DOES NOT support yield. >> */ >> struct cpipe tx_prio_pipe; >> + /** Return pipe from 'wal' to tx' */ >> + struct cpipe tx_pipe; > >Why do you need a new pipe? Since operations with file (like deleting of old xlogs) requires yield and tx_prio does not support it. 'tx' is a fiber pool, but 'tx_prio' is an endpoint and supports only callbacks without 'yield'. > >> >> +static void >> +gc_status_update(struct cmsg *msg) >> +{ >> + gc_xdir_clean_notify(); >> + delete(msg); > >delete()?! Sorry, my fault. > > >> { >> @@ -655,6 +665,15 @@ done: >> /* Until we can pass the error to tx, log it and clear. */ >> error_log(error); >> diag_clear(diag_get()); >> + if (errno == ENOSPC) { >> + struct cmsg *msg = >> + (struct cmsg*)calloc(1, sizeof(struct cmsg)); > >Please check calloc() return value and do nothing >in case it's NULL. Done. > >> + static const struct cmsg_hop route[] = { >> + {gc_status_update, NULL} >> + }; >> + cmsg_init(msg, route); >> + cpipe_push(&wal_thread.tx_pipe, msg); >> + } >> >> index 895d938d5..11f1b7fdc 100644 >> +... >> +-- add a little timeout so gc could finish job >> +require('fiber').sleep(0.01) > >Please never sleep in a test case unconditionally. It doesn't work >reliably, especially in parallel runs. > >The test case should test that replicas which are severely behind >are abandoned, and replicas which are running well are not. Ok, will replace it with new test case. > > >-- >Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 >http://tarantool.io - www.twitter.com/kostja_osipov > Best regards, Konstantin Belyavskiy k.belyavskiy@tarantool.org