Please take a look at updated patch
ticket: https://github.com/tarantool/tarantool/issues/3397
branch: https://github.com/tarantool/tarantool/tree/kbelyavs/gh-3397-force-del-logs-on-no-disk-space
Понедельник, 18 июня 2018, 15:35 +03:00 от Vladimir Davydov <vdavydov.dev@gmail.com>:FixedOn Sat, Jun 09, 2018 at 05:05:10PM +0300, Konstantin Belyavskiy wrote:
> Garbage collector do not delete xlog unless replica do not notify
> master with newer vclock. This can lead to running out of disk
> space error and this is not right behaviour since it will stop the
> master.
AFAIU this is the second iteration of this patch. If so, you should
have appended the version number to the subject prefix (i.e. PATCH v2)
according to our guideline:
https://tarantool.io/en/doc/1.9/dev_guide/developer_guidelines.html#how-to-submit-a-patch-for-review
>
> List of changes:
> - Promoting error from wal_thread to tx via cpipe.
> - Introduce states for gc_consumers (OK and OFF). GC does not
> take into account consumers with state OFF.
> - Add an error injection and a test.
The list of changes between different versions of the same patch should
go after the link to the branch and the ticket. For example see
https://www.freelists.org/post/tarantool-patches/PATCH-v2-box-make-gc-info-public
>
> Closes #3397
> ---
> ticket: https://github.com/tarantool/tarantool/issues/3397
> branch: https://github.com/tarantool/tarantool/tree/kbelyavs/gh-3397-force-del-logs-on-no-disk-space
>
> src/box/gc.c | 52 ++++++++++-
> src/box/gc.h | 13 +++
> src/box/wal.cc | 56 +++++++++--
> src/errinj.h | 1 +
> src/fio.c | 7 ++
> test/box/errinj.result | 2 +
> test/replication/force_gc_on_err_nospace.result | 107 ++++++++++++++++++++++
> test/replication/force_gc_on_err_nospace.test.lua | 38 ++++++++
> 8 files changed, 265 insertions(+), 11 deletions(-)
> create mode 100644 test/replication/force_gc_on_err_nospace.result
> create mode 100644 test/replication/force_gc_on_err_nospace.test.lua
>
> diff --git a/src/box/gc.c b/src/box/gc.c
> index 12e68f3dc..efe033aaf 100644
> --- a/src/box/gc.c
> +++ b/src/box/gc.c
> @@ -59,6 +59,8 @@ struct gc_consumer {
> gc_node_t node;
> /** Human-readable name. */
> char *name;
> + /** Dead or alive. GC ignores dead consumers. */
> + enum gc_consumer_state state;
Keeping dead consumers around is pointless. A consumer must be removed
from the tree as soon as it is evicted.
Fixed
> +void
> +gc_xdir_clean_notify(bool force_clean)
> +{
> + if (force_clean == gc.force_clean_flag)
> + return; // nothing to do
> + else if (force_clean) { // xdir clean required
> + gc.force_clean_flag = true;
We don't use C++ style comments in the code.
Fixed
> + /**
> + * Mark consumer with least recent vclock as "dead" and
> + * invoke garbage collection. If nothing to delete find
> + * next alive consumer etc. Originally created for
> + * cases with running out of disk space because of
> + * disconnected replica.
> + */
> + struct gc_consumer *leftmost =
> + gc_tree_first_alive(&gc.consumers);
> + if (leftmost == NULL)
> + return; // do nothing
> + int64_t signature = leftmost->signature;
> + while (true) {
> + leftmost->state = CONSUMER_OFF;
> + leftmost = gc_tree_first_alive(&gc.consumers);
> + if (leftmost == NULL ||
> + leftmost->signature > signature) {
> + gc_run();
> + return;
A consumer may correspond to a backup procedure, in which case it must
not be evicted, even on ENOSPC.
Sorry, my fault (use to find in logs).
> + }
> + }
> + } else
> + gc.force_clean_flag = false;
> +}
> diff --git a/src/box/wal.cc b/src/box/wal.cc
> index 099c70caa..9dd8347d5 100644
> --- a/src/box/wal.cc
> +++ b/src/box/wal.cc
> @@ -59,6 +62,11 @@ struct wal_thread {
> struct cord cord;
> /** A pipe from 'tx' thread to 'wal' */
> struct cpipe wal_pipe;
> + /**
> + * Return pipe from 'wal' to tx'. This is a
> + * priority pipe and DOES NOT support yield.
> + */
> + struct cpipe tx_prio_pipe;
> /** Return pipe from 'wal' to tx' */
> struct cpipe tx_pipe;
> };
> @@ -154,7 +166,7 @@ static void
> tx_schedule_commit(struct cmsg *msg);
>
> static struct cmsg_hop wal_request_route[] = {
> - {wal_write_to_disk, &wal_thread.tx_pipe},
> + {wal_write_to_disk, &wal_thread.tx_prio_pipe},
AFAICS you rename wal_thread.tx_pipe to tx_prio_pipe and create a new
pipe called wal_thread.tx_pipe to be used for garbage collection, all in
one patch. This is very difficult to review. Provided you really need
it, renaming should be done in a separate patch with a good explanation
why you're doing what you're doing. Currently, I fail to see a reason.
> +static void
> +gc_status_update(struct cmsg *msg)
> +{
> + msg->route = NULL;
> + say_info("QQQ: gc_status_update");
What is this?
> + gc_xdir_clean_notify(static_cast<gc_msg*>(msg)->io_err);
> +}
> +
> static void
> wal_write_to_disk(struct cmsg *msg)
> {
> @@ -647,11 +667,29 @@ wal_write_to_disk(struct cmsg *msg)
> last_committed = stailq_last(&wal_msg->commit);
>
> done:
> + bool need_send = false;
> struct error *error = diag_last_error(diag_get());
> if (error) {
> /* Until we can pass the error to tx, log it and clear. */
> error_log(error);
> diag_clear(diag_get());
> + if (errno == ENOSPC) {
> + err_nospace = true;
> + need_send = true;
> + }
> + } else if (err_nospace) {
> + /** Clear flag and send message to gc. */
> + err_nospace = false;
> + need_send = true;
> + }
> + if (need_send) {
This joggling with flags looks confusing. What are you trying to achieve
here? Avoid invoking garbage collector twice on ENOSPC error? At the
very least, this code should be accompanied by a comment.
Also, AFAICS you fail a transaction that hit ENOSPC and triggered
garbage collection. I don't think it's good. I see two ways of
rectifying this: either retry WAL write after invoking garbage
collection or use fallocate to reserve some space beyond the end of
the file before writing to WAL.
Fixed
> + struct gc_msg msg;
> + static const struct cmsg_hop route[] = {
> + {gc_status_update, NULL}
> + };
> + cmsg_init(&msg, route);
> + msg.io_err = err_nospace;
> + cpipe_push(&wal_thread.tx_pipe, &msg);
> }
Pushing a message defined on stack to a pipe and not waiting for it to
be delivered doesn't look right.