[Tarantool-patches] [RFC] qsync: overall design - Cyrill Gorcunov via Tarantool-patches

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Cyrill Gorcunov via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Serge Petrenko <sergepetrenko@tarantool.org>,
	Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tml <tarantool-patches@dev.tarantool.org>
Subject: [Tarantool-patches] [RFC] qsync: overall design
Date: Mon, 13 Sep 2021 17:20:45 +0300	[thread overview]
Message-ID: <YT9ePZXvVuUZvW0f@grain> (raw)
In-Reply-To: <17e39694-2f8b-da02-6009-c97ec46e8609@tarantool.org>

On Mon, Sep 13, 2021 at 11:52:39AM +0300, Serge Petrenko wrote:
> > 
> > Or maybe better via email. Serge could you please write the details here?
> 
> It would be easier to discuss this verbally, I think.

Verbal meetings are good indeed, but maybe I could summarize all the
problems found so far and imprint them here. Guys, please comment,
I would really appreciate.

Terms accessing ordering
------------------------

We've found that fibers can read old terms which are already updated
but not yet written into WAL. To address this we order term reading
so appliers will wait until write to WAL is complete. While everyone
is agree that there is an issue and ordering solves it we're not yet
complete clear about internal design.

I proposed to use explicit locking via txn_limbo_term_lock/unlock calls.
The calls are used inside applier's code

apply_synchro_row
  txn_limbo_term_lock
    journal_write
  txn_limbo_term_unlock

the key moment is journal_write() call which queues completion to run and the
completion code is called from inside of sched() fiber, ie not the fiber which
took the lock (and such lock migration is prohibited in our latch-lock engine).

The propose was to hide the locking mechanism inside limbo internals code
completely, so the callers won't know about it. When I tried to make so I hit
the problem with lock context migration and had to step back to use explicit
locks as in code above.

Still Vlad's question remains

 | 2. As for your complaint about the begin/commit/rollback API
 | being not working because you can't unlock from a non-owner
 | fiber - well, it works in your patch somehow, doesn't it?

I already explained why it works with explicit locking.
https://lists.tarantool.org/tarantool-patches/YT8tZ0CuIDKwzcC4@grain/
In short - we take and release the lock in same context.

 |
 | Why do you in your patch unlock here, but in the newly proposed
 | API you only tried to unlock in the trigger?

Because our commit/rollback are called from inside of sched() fiber,
and we will have to provide some helper like completion of completion
where second completion will be called from inside of applier context
to unlock terms. As to me this is a way more messy than explicit locking
scheme.

 |
 | You could call commit/rollback from this function, like you
 | do with unlock now.

This moment I don't understand. We already have commit/rollback helpers,
so I ask Vlad to write some pseudocode, to figure out what exactly you
have in mind.

Limbo's confirmed_lsn update upon CONFIRM request read
------------------------------------------------------

Currently we update limbo::confirmed_lsn when node _writes_ this
request into the WAL. This is done on limbo owner node only, ie
transaction initiator. In result when the node which has not been
leader at all takes limbo ownership it sends own "confirmed_lsn = 0"
inside PROMOTE request, and when this request reaches previous leader
node we don't allow to proceed (due to our filtration rules where
we require the LSN to be > than current confirmed_lsn). Also Serge
pointed out

 a)
 | Confirmed_lsn wasn't tracked during recovery and while
 | following a master. So, essentially, only the living master could
 | detect splitbrains by comparing confirmed_lsn to something else.

 b)
 | Say, a pair of CONFIRM requests is written, with lsns
 | N and N+1. So you first enter write_confirm(N), then
 | write_confirm(N+1). Now both fibers issuing the requests yield
 | waiting for the write to happen, and confirmed_lsn is N+1.
 |
 | Once the first CONFIRM (N) is written, you reset confirmed_lsn to N
 | right in read_confirm.
 |
 | So until the second CONFIRM (N+1) is written, there's a window
 | when confirmed_lsn is N, but it should be N+1.
 |
 | I think read_confirm should set confirmed_lsn on replica only.
 | On master this task is performed by write_confirm.
 | You may split read_confirm in two parts:
 |  - set confirmed lsn (used only on replica) and
 |  - apply_confirm (everything read_confirm did before your patch)

Thus I seems need to rework this aspect.

Update confirmed_lsn on first PROMOTE request arrival
-----------------------------------------------------

Detailed explanation what I've seen is there

https://lists.tarantool.org/tarantool-patches/YT5+YqCJuAh0HAQg@grain/

I must confess I don't like much this moment as well since
this is a bit vague point for me so we gonna look into it
soon on a meeting.

Filtration procedure itself (split detector)
-------------------------------------------

When CONFIRM or ROLLBACK packet comes in it is not enough to
test for limbo emptiness only. We should rather traverse the
queue and figure out if LSN inside the packet belongs to the
current queue.

So the *preliminary* conclusion is the following: when CONFIRM
or ROLLBACK is coming in
a) queue is empty -- then such request is invalid and we should
   exit with error
b) queue is not empty -- then LSN should belong to a range covered
   by the queue
c) it is unclear how to test this scenario

Filtration disabling for joining and local recovery
---------------------------------------------------

When joining or recovery happens the limbo is in empty state then
our filtration start triggering false positives. For example

> autobootstrap1.sock I> limbo: filter PROMOTE replica_id 0 origin_id 0
> term 0 lsn 0, queue owner_id 0 len 0 promote_greatest_term 0 confirmed_lsn 0

This is because we require the term to be nonzero when cluster is running.

	/*
	 * PROMOTE and DEMOTE packets must not have zero
	 * term supplied, otherwise it is a broken packet.
	 */
	if (req->term == 0) {
		say_error("%s. Zero term detected", reject_str(req));
		diag_set(ClientError, ER_CLUSTER_SPLIT,
			 "Request with zero term");
		return -1;
	}

If we want to not disable filtration at all then we need to introduce
some state machine which would cover initial -> working state. I think
better to start with simpler approach where we don't verify data on
join/recovery and then extend filtration if needed.

next prev parent reply	other threads:[~2021-09-13 14:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-10 15:29 [Tarantool-patches] [PATCH v14 0/6] qsync: implement packets filtering Cyrill Gorcunov via Tarantool-patches
2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 1/6] qsync: track confirmed lsn number on reads Cyrill Gorcunov via Tarantool-patches
2021-09-12 15:44   ` Vladislav Shpilevoy via Tarantool-patches
2021-09-12 22:18     ` Cyrill Gorcunov via Tarantool-patches
2021-09-13  8:33     ` Serge Petrenko via Tarantool-patches
2021-09-13  8:50   ` Serge Petrenko via Tarantool-patches
2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 2/6] qsync: update confirmed lsn on initial promote request Cyrill Gorcunov via Tarantool-patches
2021-09-12 15:44   ` Vladislav Shpilevoy via Tarantool-patches
2021-09-12 22:25     ` Cyrill Gorcunov via Tarantool-patches
2021-09-13  8:52       ` Serge Petrenko via Tarantool-patches
2021-09-13 14:20         ` Cyrill Gorcunov via Tarantool-patches [this message]
2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 3/6] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 4/6] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
2021-09-12 15:44   ` Vladislav Shpilevoy via Tarantool-patches
2021-09-13 10:52     ` Cyrill Gorcunov via Tarantool-patches
2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 5/6] qsync: filter incoming synchro requests Cyrill Gorcunov via Tarantool-patches
2021-09-12 15:44   ` Vladislav Shpilevoy via Tarantool-patches
2021-09-14 19:41     ` Cyrill Gorcunov via Tarantool-patches
2021-09-10 15:29 ` [Tarantool-patches] [PATCH v14 6/6] test: add replication/gh-6036-rollback-confirm Cyrill Gorcunov via Tarantool-patches
2021-09-12 15:44   ` Vladislav Shpilevoy via Tarantool-patches
2021-09-12 15:43 ` [Tarantool-patches] [PATCH v14 0/6] qsync: implement packets filtering Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YT9ePZXvVuUZvW0f@grain \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [RFC] qsync: overall design' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox