Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Konstantin Belyavskiy <k.belyavskiy@tarantool.org>
Cc: georgy@tarantool.org, tarantool-patches@freelists.org
Subject: Re: [tarantool-patches] [PATCH] replication: force gc to clean xdir on ENOSPC err
Date: Mon, 18 Jun 2018 15:35:23 +0300	[thread overview]
Message-ID: <20180618123523.tsxxaedwm5262qj3@esperanza> (raw)
In-Reply-To: <20180609140510.81377-1-k.belyavskiy@tarantool.org>

On Sat, Jun 09, 2018 at 05:05:10PM +0300, Konstantin Belyavskiy wrote:
> Garbage collector do not delete xlog unless replica do not notify
> master with newer vclock. This can lead to running out of disk
> space error and this is not right behaviour since it will stop the
> master.

AFAIU this is the second iteration of this patch. If so, you should
have appended the version number to the subject prefix (i.e. PATCH v2)
according to our guideline:

  https://tarantool.io/en/doc/1.9/dev_guide/developer_guidelines.html#how-to-submit-a-patch-for-review

> 
> List of changes:
> - Promoting error from wal_thread to tx via cpipe.
> - Introduce states for gc_consumers (OK and OFF). GC does not
> take into account consumers with state OFF.
> - Add an error injection and a test.

The list of changes between different versions of the same patch should
go after the link to the branch and the ticket. For example see

  https://www.freelists.org/post/tarantool-patches/PATCH-v2-box-make-gc-info-public

> 
> Closes #3397
> ---
> ticket: https://github.com/tarantool/tarantool/issues/3397
> branch: https://github.com/tarantool/tarantool/tree/kbelyavs/gh-3397-force-del-logs-on-no-disk-space
> 
>  src/box/gc.c                                      |  52 ++++++++++-
>  src/box/gc.h                                      |  13 +++
>  src/box/wal.cc                                    |  56 +++++++++--
>  src/errinj.h                                      |   1 +
>  src/fio.c                                         |   7 ++
>  test/box/errinj.result                            |   2 +
>  test/replication/force_gc_on_err_nospace.result   | 107 ++++++++++++++++++++++
>  test/replication/force_gc_on_err_nospace.test.lua |  38 ++++++++
>  8 files changed, 265 insertions(+), 11 deletions(-)
>  create mode 100644 test/replication/force_gc_on_err_nospace.result
>  create mode 100644 test/replication/force_gc_on_err_nospace.test.lua
> 
> diff --git a/src/box/gc.c b/src/box/gc.c
> index 12e68f3dc..efe033aaf 100644
> --- a/src/box/gc.c
> +++ b/src/box/gc.c
> @@ -59,6 +59,8 @@ struct gc_consumer {
>  	gc_node_t node;
>  	/** Human-readable name. */
>  	char *name;
> +	/** Dead or alive. GC ignores dead consumers. */
> +	enum gc_consumer_state state;

Keeping dead consumers around is pointless. A consumer must be removed
from the tree as soon as it is evicted.

> +void
> +gc_xdir_clean_notify(bool force_clean)
> +{
> +	if (force_clean == gc.force_clean_flag)
> +		return; // nothing to do
> +	else if (force_clean) { // xdir clean required
> +		gc.force_clean_flag = true;

We don't use C++ style comments in the code.

> +		/**
> +		 * Mark consumer with least recent vclock as "dead" and
> +		 * invoke garbage collection. If nothing to delete find
> +		 * next alive consumer etc. Originally created for
> +		 * cases with running out of disk space because of
> +		 * disconnected replica.
> +		 */
> +		struct gc_consumer *leftmost =
> +		    gc_tree_first_alive(&gc.consumers);
> +		if (leftmost == NULL)
> +			return; // do nothing
> +		int64_t signature = leftmost->signature;
> +		while (true) {
> +			leftmost->state = CONSUMER_OFF;
> +			leftmost = gc_tree_first_alive(&gc.consumers);
> +			if (leftmost == NULL ||
> +			    leftmost->signature > signature) {
> +				gc_run();
> +				return;

A consumer may correspond to a backup procedure, in which case it must
not be evicted, even on ENOSPC.

> +			}
> +		}
> +	} else
> +		gc.force_clean_flag = false;
> +}

> diff --git a/src/box/wal.cc b/src/box/wal.cc
> index 099c70caa..9dd8347d5 100644
> --- a/src/box/wal.cc
> +++ b/src/box/wal.cc
> @@ -59,6 +62,11 @@ struct wal_thread {
>  	struct cord cord;
>  	/** A pipe from 'tx' thread to 'wal' */
>  	struct cpipe wal_pipe;
> +	/**
> +	 * Return pipe from 'wal' to tx'. This is a
> +	 * priority pipe and DOES NOT support yield.
> +	 */
> +	struct cpipe tx_prio_pipe;
>  	/** Return pipe from 'wal' to tx' */
>  	struct cpipe tx_pipe;
>  };
> @@ -154,7 +166,7 @@ static void
>  tx_schedule_commit(struct cmsg *msg);
>  
>  static struct cmsg_hop wal_request_route[] = {
> -	{wal_write_to_disk, &wal_thread.tx_pipe},
> +	{wal_write_to_disk, &wal_thread.tx_prio_pipe},

AFAICS you rename wal_thread.tx_pipe to tx_prio_pipe and create a new
pipe called wal_thread.tx_pipe to be used for garbage collection, all in
one patch. This is very difficult to review. Provided you really need
it, renaming should be done in a separate patch with a good explanation
why you're doing what you're doing. Currently, I fail to see a reason.

> +static void
> +gc_status_update(struct cmsg *msg)
> +{
> +	msg->route = NULL;
> +	say_info("QQQ: gc_status_update");

What is this?

> +	gc_xdir_clean_notify(static_cast<gc_msg*>(msg)->io_err);
> +}
> +
>  static void
>  wal_write_to_disk(struct cmsg *msg)
>  {
> @@ -647,11 +667,29 @@ wal_write_to_disk(struct cmsg *msg)
>  	last_committed = stailq_last(&wal_msg->commit);
>  
>  done:
> +	bool need_send = false;
>  	struct error *error = diag_last_error(diag_get());
>  	if (error) {
>  		/* Until we can pass the error to tx, log it and clear. */
>  		error_log(error);
>  		diag_clear(diag_get());
> +		if (errno == ENOSPC) {
> +			err_nospace = true;
> +			need_send = true;
> +		}
> +	} else if (err_nospace) {
> +		/** Clear flag and send message to gc. */
> +		err_nospace = false;
> +		need_send = true;
> +	}
> +	if (need_send) {

This joggling with flags looks confusing. What are you trying to achieve
here? Avoid invoking garbage collector twice on ENOSPC error? At the
very least, this code should be accompanied by a comment.

Also, AFAICS you fail a transaction that hit ENOSPC and triggered
garbage collection. I don't think it's good. I see two ways of
rectifying this: either retry WAL write after invoking garbage
collection or use fallocate to reserve some space beyond the end of
the file before writing to WAL.

> +		struct gc_msg msg;
> +		static const struct cmsg_hop route[] = {
> +			{gc_status_update, NULL}
> +		};
> +		cmsg_init(&msg, route);
> +		msg.io_err = err_nospace;
> +		cpipe_push(&wal_thread.tx_pipe, &msg);
>  	}

Pushing a message defined on stack to a pipe and not waiting for it to
be delivered doesn't look right.

  reply	other threads:[~2018-06-18 12:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-09 14:05 Konstantin Belyavskiy
2018-06-18 12:35 ` Vladimir Davydov [this message]
2018-06-27 16:11   ` Re[2]: [tarantool-patches] [PATCH v3] " Konstantin Belyavskiy
2018-06-28  8:17     ` Vladimir Davydov
2018-06-28 12:53     ` [tarantool-patches] Re[2]: " Konstantin Osipov
  -- strict thread matches above, loose matches on Subject: below --
2018-06-06  9:27 [tarantool-patches] [PATCH] " Konstantin Belyavskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180618123523.tsxxaedwm5262qj3@esperanza \
    --to=vdavydov.dev@gmail.com \
    --cc=georgy@tarantool.org \
    --cc=k.belyavskiy@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [tarantool-patches] [PATCH] replication: force gc to clean xdir on ENOSPC err' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox