[Tarantool-patches] [PATCH] limbo: introduce request processing hooks

Sun Jul 11 21:22:37 MSK 2021

On Sun, Jul 11, 2021 at 04:00:35PM +0200, Vladislav Shpilevoy wrote:
> > 
> > Thus we need to separate "filer" and "apply" stages of processing.
> > What is more interesting is that we filter incomings via in memory
> > vclock and update them immediately. Thus the following situation
> > is possible -- a promote request comes in, we remember it inside
> > promote_term_map but then write to WAL fails and we never revert
> > the promote_term_map variable, thus other peer won't be able to
> > resend us this promote request because now we think that we've
> > alreday applied the promote.
> 
> Well, I still don't understand what the issue is. We discussed it
> privately already. You simply should not apply anything until WAL
> write is done. And it is not happening now on the master. The
> terms vclock is updated only **after** WAL write.

Currently we don't reject the data but silently ignore unexpected
requests after we've written them. In new approach we should decide
if incomng packet can be applied and get written to the WAL or we
should filter them out and return an error to the client. In case
if we filter out request we must not modify our WAL content at all.

> Why do you need all these new vclocks if you should not apply
> anything before WAL write in the first place?
> 
> > write to WAL fails and we never revert
> > the promote_term_map variable
> 
> This simply should not be possible. The term map is updated only
> after WAL write is done. At least this is how it works now, doesn't
> it? Why did you change it? (In case you did, because I am not sure,
> I didn't review the code throughly).

Look, I think this maybe not explicitly seen from the patch since
diff is big enough. But here is a key moment. Current code is called
after WAL is updated

void
txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
{
	uint64_t term = req->term;
	uint32_t origin = req->origin_id;
	if (txn_limbo_replica_term(limbo, origin) < term) {
		vclock_follow(&limbo->promote_term_map, origin, term);
		if (term > limbo->promote_greatest_term) {
			limbo->promote_greatest_term = term;
		} else if (req->type == IPROTO_PROMOTE) {
			/*
			 * PROMOTE for outdated term. Ignore.
			 */
-->			return;
		}
	}

we exit early without any error.

In new approach we start reporting an error if this situation is
happening, 'cause it is a split brain. But we shall not write this
term into our WAL file, thus we should validate incoming packet
earlier.

Now imagine the following: we validated the incoming packet and
remember its term in promote_term_map, then we start writting
this packet into our WAL and write procedure failed. In result
our promote_term_map remains updated but real data on disk
won't match it. What is worse is that such request may have
updated @promote_greatest_term as well. In result our
test for obsolete replicas will be false positive

static inline bool
txn_limbo_is_replica_outdated(const struct txn_limbo *limbo,
			      uint32_t replica_id)
{
	return txn_limbo_replica_term(limbo, replica_id) <
	       limbo->promote_greatest_term;
}

because promote_greatest_term is already updated but
not on WAL level.

So I had to split our txn_limbo_process into two stages:
"filter" and "application". If application stage fails
we restore original max term so the replica will be able
to resend us the promote request and we try to write it
to the WAL again.