[Tarantool-patches] [PATCH] limbo: introduce request processing hooks
Serge Petrenko
sergepetrenko at tarantool.org
Mon Jul 12 10:54:02 MSK 2021
11.07.2021 01:28, Cyrill Gorcunov пишет:
> Guys, this is an early rfc since I would like to discuss the
> design first before going further. Currently we don't interrupt
> incoming syncro requests which doesn't allow us to detect cluster
> split-brain situation, as we were discussing verbally there are
> a number of sign to detect it and we need to stop receiving data
> from obsolete nodes.
>
> The main problem though is that such filtering of incoming packets
> should happen at the moment where we still can do a step back and
> inform the peer that data has been declined, but currently our
> applier code process syncro requests inside WAL trigger, ie when
> data is already applied or rolling back.
>
> Thus we need to separate "filer" and "apply" stages of processing.
> What is more interesting is that we filter incomings via in memory
> vclock and update them immediately. Thus the following situation
> is possible -- a promote request comes in, we remember it inside
> promote_term_map but then write to WAL fails and we never revert
> the promote_term_map variable, thus other peer won't be able to
> resend us this promote request because now we think that we've
> alreday applied the promote.
>
> To solve this I split processing routine into stages:
>
> - filter stage: we investigate infly packets and remember
> their terms in @terms_infly vclock
> - apply stage: the data is verified and we try to apply the
> data and write it to the disk, once everything is fine
> we update @terms_applied vclock which serves as a backup
> of @terms_infly
> - error stage: data application failed and we should revert
> @terms_infly vclock to the previous value
>
> The stages are processed by txn_limbo_apply routine which takes
> a mask of stages it should execute. Old txn_limbo_process is
> simply an inline wrapper with appropriate flags.
>
> Please note that I _didn't_ put complete conditions into
> limbo_op_filter yet only moved current code there.
>
> So take a look once time permit, maybe there some easier
> approach in code structure.
>
> branch gorcunov/gh-6036-rollback-confirm-notest
>
> Signed-off-by: Cyrill Gorcunov <gorcunov at gmail.com>
> ---
Hi! Thanks for the patch!
The general idea looks good to me.
Please, find a couple of comments below.
> src/box/applier.cc | 13 +++-
> src/box/box.cc | 6 +-
> src/box/txn_limbo.c | 149 ++++++++++++++++++++++++++++++++++++++------
> src/box/txn_limbo.h | 97 ++++++++++++++++++++++------
> 4 files changed, 223 insertions(+), 42 deletions(-)
>
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 978383e64..8a44bf1b2 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -458,7 +458,8 @@ applier_wait_snapshot(struct applier *applier)
> struct synchro_request req;
> if (xrow_decode_synchro(&row, &req) != 0)
> diag_raise();
> - txn_limbo_process(&txn_limbo, &req);
> + if (txn_limbo_process(&txn_limbo, &req) != 0)
> + diag_raise();
> } else if (iproto_type_is_raft_request(row.type)) {
> struct raft_request req;
> if (xrow_decode_raft(&row, &req, NULL) != 0)
> @@ -850,11 +851,16 @@ apply_synchro_row_cb(struct journal_entry *entry)
> assert(entry->complete_data != NULL);
> struct synchro_entry *synchro_entry =
> (struct synchro_entry *)entry->complete_data;
> + struct synchro_request *req = synchro_entry->req;
> +
> if (entry->res < 0) {
> + txn_limbo_apply(&txn_limbo, req,
> + LIMBO_OP_ERROR | LIMBO_OP_ATTR_PANIC);
I don't like that you pass LIMBO_OP_ATTR_PANIC in a bitfield.
We usually don't do such things. It would be nice if you could
get rid of this flag or maybe pass it as bool.
Your approach might be totally fine though, that's just my POV.
> applier_rollback_by_wal_io(entry->res);
> } else {
> replica_txn_wal_write_cb(synchro_entry->rcb);
> - txn_limbo_process(&txn_limbo, synchro_entry->req);
> + txn_limbo_apply(&txn_limbo, synchro_entry->req,
> + LIMBO_OP_APPLY | LIMBO_OP_ATTR_PANIC);
> trigger_run(&replicaset.applier.on_wal_write, NULL);
> }
> fiber_wakeup(synchro_entry->owner);
> @@ -870,6 +876,9 @@ apply_synchro_row(uint32_t replica_id, struct xrow_header *row)
> if (xrow_decode_synchro(row, &req) != 0)
> goto err;
>
> + if (txn_limbo_apply(&txn_limbo, &req, LIMBO_OP_FILTER) != 0)
> + goto err;
> +
Maybe simply call txn_limbo_filter() here?
You always know which function to call at compile time, so there's no
need to pass OP_FILTER, OP_APPLY, etc as a parameter IMO.
Same in other places.
> struct replica_cb_data rcb_data;
> struct synchro_entry entry;
> /*
...
> +
> +static int
> +limbo_op_error(struct txn_limbo *limbo, const struct synchro_request *req)
> +{
> + struct txn_limbo_promote *pmt = &limbo->promote;
> + uint32_t replica_id = req->origin_id;
> + /*
> + * Restore to the applied value in case of error,
> + * this will allow to reapply the entry when remote
> + * node get error and try to resend data.
> + */
> + uint64_t v = vclock_get(&pmt->terms_applied, replica_id);
> + vclock_reset(&pmt->terms_infly, replica_id, v);
Looks like the following situation is possible:
The same instance writes two consequent promotes for terms
i, i+1.
When some replica receives the promotes, they both pass the
filter stage, so vclock_infly[instance_id] = i+1
and vclock_applied[instance_id] = i - 1
Then the first promote succeeds and the second one fails
Isn't it possible that vclock_infly would be reset to i - 1
instead of i ?
AFAICS in wal.c in tx_complete_batch() code, rollbacks are processed
earlier than commits, so the situation above is possible.
Am I wrong? If not, we need to handle this somehow. Probably, remember
previous term to roll back to in each request after the filter() stage.
> +
> + /*
> + * The max value has to be recalculated in a linear
> + * form, the errors should not happen frequently so
> + * this is not a hot path.
> + */
> + int64_t maxv = 0;
> + struct vclock_iterator it;
> + vclock_iterator_init(&it, &pmt->terms_infly);
> + vclock_foreach(&it, r)
> + maxv = r.lsn > maxv ? r.lsn : maxv;
> + pmt->term_max_infly = maxv;
> + return 0;
> +}
> +
>
...
--
Serge Petrenko
More information about the Tarantool-patches
mailing list