From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id ECF0C271C9 for ; Fri, 6 Jul 2018 13:00:26 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aosSbhWhI92l for ; Fri, 6 Jul 2018 13:00:26 -0400 (EDT) Received: from smtp46.i.mail.ru (smtp46.i.mail.ru [94.100.177.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 5E1E3271C5 for ; Fri, 6 Jul 2018 13:00:26 -0400 (EDT) Received: from [185.6.245.178] (port=54422 helo=atlas.local) by smtp46.i.mail.ru with esmtpa (envelope-from ) id 1fbU5o-0000RH-7u for tarantool-patches@freelists.org; Fri, 06 Jul 2018 20:00:24 +0300 Date: Fri, 6 Jul 2018 20:00:23 +0300 From: Konstantin Osipov Subject: [tarantool-patches] Re: [PATCH v5 2/2] replication: force gc to clean xdir on ENOSPC err Message-ID: <20180706170023.GA29935@chai> References: <31d0178970ffb16aa02d245585d6fb9e4b4ffea2.1530815780.git.k.belyavskiy@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <31d0178970ffb16aa02d245585d6fb9e4b4ffea2.1530815780.git.k.belyavskiy@tarantool.org> Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org * Konstantin Belyavskiy [18/07/05 22:56]: > Garbage collector do not delete xlog unless replica do not notify > master with newer vclock. This can lead to running out of disk > space error and this is not right behaviour since it will stop the > master. > Fix it by forcing gc to clean xlogs for replica with highest lag. > Add an error injection and a test. Please rebase this patch to the latest 1.10 Please use relay_stop() as a callback to unregister the consumer. > +void > +gc_xdir_clean_notify() > +{ > + /* > + * Compare the current time with the time of the last run. > + * This is needed in case of multiple failures to prevent > + * from deleting all replicas. > + */ > + static double prev_time = 0.; > + double cur_time = ev_monotonic_time(); > + if (cur_time - prev_time < 1.) > + return; This throttles gc, which is good. But we would still get a lot of messages from WAL thread. Maybe we should move the throttling to the WAL side? This would spare us from creating the message as well. Ideally we should use a single statically allocated message from the WAL for this purpose (but still throttle it as well). Plus, eventually you're going to reach a state when kicking off replicas doesn't help with space. In this case you're going to have a lot of messages, and they are going to be all useless. This also suggests that throttling should be done on the WAL side. > + prev_time = cur_time; > + struct gc_consumer *leftmost = > + gc_tree_first(&gc.consumers); > + /* > + * Exit if no consumers left or if this consumer is > + * not associated with replica (backup for example). > + */ > + if (leftmost == NULL || leftmost->replica == NULL) > + return; > + /* > + * We have to maintain @checkpoint_count oldest snapshots, > + * plus we can't remove snapshots that are still in use. > + * So if leftmost replica has signature greater or equel > + * then the oldest checkpoint that must be preserved, > + * nothing to do. > + */ This comment is useful, but the search in checkpoint array is not. What about possible other types of consumers which are not dispensable with anyway, e.g. backups? What if they are holding a reference as well? Apparently this check is taking care of the problem: > + if (leftmost == NULL || leftmost->replica == NULL) > + return; Could you write a test with two "abandoned" replicas, each holding an xlog file? -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov