Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>,
	tml <tarantool-patches@dev.tarantool.org>
Cc: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH] limbo: introduce request processing hooks
Date: Mon, 12 Jul 2021 10:54:02 +0300
Message-ID: <a67daae3-4817-50e0-e56f-80133b6d398c@tarantool.org> (raw)
In-Reply-To: <20210710222803.253251-1-gorcunov@gmail.com>



11.07.2021 01:28, Cyrill Gorcunov пишет:
> Guys, this is an early rfc since I would like to discuss the
> design first before going further. Currently we don't interrupt
> incoming syncro requests which doesn't allow us to detect cluster
> split-brain situation, as we were discussing verbally there are
> a number of sign to detect it and we need to stop receiving data
> from obsolete nodes.
>
> The main problem though is that such filtering of incoming packets
> should happen at the moment where we still can do a step back and
> inform the peer that data has been declined, but currently our
> applier code process syncro requests inside WAL trigger, ie when
> data is already applied or rolling back.
>
> Thus we need to separate "filer" and "apply" stages of processing.
> What is more interesting is that we filter incomings via in memory
> vclock and update them immediately. Thus the following situation
> is possible -- a promote request comes in, we remember it inside
> promote_term_map but then write to WAL fails and we never revert
> the promote_term_map variable, thus other peer won't be able to
> resend us this promote request because now we think that we've
> alreday applied the promote.
>
> To solve this I split processing routine into stages:
>
>   - filter stage: we investigate infly packets and remember
>     their terms in @terms_infly vclock
>   - apply stage: the data is verified and we try to apply the
>     data and write it to the disk, once everything is fine
>     we update @terms_applied vclock which serves as a backup
>     of @terms_infly
>   - error stage: data application failed and we should revert
>     @terms_infly vclock to the previous value
>
> The stages are processed by txn_limbo_apply routine which takes
> a mask of stages it should execute. Old txn_limbo_process is
> simply an inline wrapper with appropriate flags.
>
> Please note that I _didn't_ put complete conditions into
> limbo_op_filter yet only moved current code there.
>
> So take a look once time permit, maybe there some easier
> approach in code structure.
>
> branch gorcunov/gh-6036-rollback-confirm-notest
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---

Hi! Thanks for the patch!

The general idea looks good to me.
Please, find a couple of comments below.

>   src/box/applier.cc  |  13 +++-
>   src/box/box.cc      |   6 +-
>   src/box/txn_limbo.c | 149 ++++++++++++++++++++++++++++++++++++++------
>   src/box/txn_limbo.h |  97 ++++++++++++++++++++++------
>   4 files changed, 223 insertions(+), 42 deletions(-)
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 978383e64..8a44bf1b2 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -458,7 +458,8 @@ applier_wait_snapshot(struct applier *applier)
>   				struct synchro_request req;
>   				if (xrow_decode_synchro(&row, &req) != 0)
>   					diag_raise();
> -				txn_limbo_process(&txn_limbo, &req);
> +				if (txn_limbo_process(&txn_limbo, &req) != 0)
> +					diag_raise();
>   			} else if (iproto_type_is_raft_request(row.type)) {
>   				struct raft_request req;
>   				if (xrow_decode_raft(&row, &req, NULL) != 0)
> @@ -850,11 +851,16 @@ apply_synchro_row_cb(struct journal_entry *entry)
>   	assert(entry->complete_data != NULL);
>   	struct synchro_entry *synchro_entry =
>   		(struct synchro_entry *)entry->complete_data;
> +	struct synchro_request *req = synchro_entry->req;
> +
>   	if (entry->res < 0) {
> +		txn_limbo_apply(&txn_limbo, req,
> +				LIMBO_OP_ERROR | LIMBO_OP_ATTR_PANIC);

I don't like that you pass LIMBO_OP_ATTR_PANIC in a bitfield.
We usually don't do such things. It would be nice if you could
get rid of this flag or maybe pass it as bool.

Your approach might be totally fine though, that's just my POV.

>   		applier_rollback_by_wal_io(entry->res);
>   	} else {
>   		replica_txn_wal_write_cb(synchro_entry->rcb);
> -		txn_limbo_process(&txn_limbo, synchro_entry->req);
> +		txn_limbo_apply(&txn_limbo, synchro_entry->req,
> +				LIMBO_OP_APPLY | LIMBO_OP_ATTR_PANIC);
>   		trigger_run(&replicaset.applier.on_wal_write, NULL);
>   	}
>   	fiber_wakeup(synchro_entry->owner);
> @@ -870,6 +876,9 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
>   	if (xrow_decode_synchro(row, &req) != 0)
>   		goto err;
>   
> +	if (txn_limbo_apply(&txn_limbo, &req, LIMBO_OP_FILTER) != 0)
> +		goto err;
> +

Maybe simply call txn_limbo_filter() here?
You always know which function to call at compile time, so there's no
need to pass OP_FILTER, OP_APPLY, etc as a parameter IMO.

Same in other places.

>   	struct replica_cb_data rcb_data;
>   	struct synchro_entry entry;
>   	/*


...


> +
> +static int
> +limbo_op_error(struct txn_limbo *limbo, const struct synchro_request *req)
> +{
> +	struct txn_limbo_promote *pmt = &limbo->promote;
> +	uint32_t replica_id = req->origin_id;
> +	/*
> +	 * Restore to the applied value in case of error,
> +	 * this will allow to reapply the entry when remote
> +	 * node get error and try to resend data.
> +	 */
> +	uint64_t v = vclock_get(&pmt->terms_applied, replica_id);
> +	vclock_reset(&pmt->terms_infly, replica_id, v);

Looks like the following situation is possible:

The same instance writes two consequent promotes for terms
i, i+1.

When some replica receives the promotes, they both pass the
filter stage, so vclock_infly[instance_id] = i+1
and vclock_applied[instance_id] = i - 1

Then the first promote succeeds and the second one fails

Isn't it possible that vclock_infly would be reset to i - 1
instead of i ?

AFAICS in wal.c in tx_complete_batch() code, rollbacks are processed
earlier than commits, so the situation above is possible.
Am I wrong? If not, we need to handle this somehow. Probably, remember
previous term to roll back to in each request after the filter() stage.


> +
> +	/*
> +	 * The max value has to be recalculated in a linear
> +	 * form, the errors should not happen frequently so
> +	 * this is not a hot path.
> +	 */
> +	int64_t maxv = 0;
> +	struct vclock_iterator it;
> +	vclock_iterator_init(&it, &pmt->terms_infly);
> +	vclock_foreach(&it, r)
> +		maxv = r.lsn > maxv ? r.lsn : maxv;
> +	pmt->term_max_infly = maxv;
> +	return 0;
> +}
> +
>

...

-- 
Serge Petrenko


      parent reply	other threads:[~2021-07-12  7:54 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10 22:28 Cyrill Gorcunov via Tarantool-patches
2021-07-11 14:00 ` Vladislav Shpilevoy via Tarantool-patches
2021-07-11 18:22   ` Cyrill Gorcunov via Tarantool-patches
2021-07-12  8:03     ` Vladislav Shpilevoy via Tarantool-patches
2021-07-12  8:09       ` Cyrill Gorcunov via Tarantool-patches
2021-07-12 21:20         ` Vladislav Shpilevoy via Tarantool-patches
2021-07-12 22:32           ` Cyrill Gorcunov via Tarantool-patches
2021-07-13 19:32             ` Vladislav Shpilevoy via Tarantool-patches
2021-07-12  8:01   ` Serge Petrenko via Tarantool-patches
2021-07-12  8:04     ` Vladislav Shpilevoy via Tarantool-patches
2021-07-12  8:12       ` Cyrill Gorcunov via Tarantool-patches
2021-07-12  8:23         ` Cyrill Gorcunov via Tarantool-patches
2021-07-12 21:20         ` Vladislav Shpilevoy via Tarantool-patches
2021-07-12 22:34           ` Cyrill Gorcunov via Tarantool-patches
2021-07-12  9:43     ` Cyrill Gorcunov via Tarantool-patches
2021-07-12 15:48       ` Cyrill Gorcunov via Tarantool-patches
2021-07-12 16:49         ` Serge Petrenko via Tarantool-patches
2021-07-12 17:04           ` Cyrill Gorcunov via Tarantool-patches
2021-07-12 21:20             ` Vladislav Shpilevoy via Tarantool-patches
2021-07-12 21:52               ` Cyrill Gorcunov via Tarantool-patches
2021-07-12  7:54 ` Serge Petrenko via Tarantool-patches [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a67daae3-4817-50e0-e56f-80133b6d398c@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git