Tarantool development patches archive
 help / color / mirror / Atom feed
From: Sergey Ostanevich <sergos@tarantool.org>
To: Konstantin Osipov <kostja.osipov@gmail.com>,
	Vladislav Shpilevoy <v.shpilevoy@tarantool.org>,
	tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication
Date: Tue, 12 May 2020 19:40:48 +0300	[thread overview]
Message-ID: <20200512164048.GM112@tarantool.org> (raw)
In-Reply-To: <20200507230112.GB14285@atlas>

On 08 мая 02:01, Konstantin Osipov wrote:
> 
> 
> > ### Synchronous replication enabling.
> > 
> > Synchronous operation can be required for a set of spaces in the data
> > scheme. That means only transactions that contain data modification for
> > these spaces should require quorum. Such transactions named synchronous.
> > As soon as last operation of synchronous transaction appeared in leader's
> > WAL, it will cause all following transactions - no matter if they are
> > synchronous or not - wait for the quorum. In case quorum is not achieved
> > the 'rollback' operation will cause rollback of all transactions after
> > the synchronous one. It will ensure the consistent state of the data both
> > on leader and replicas. In case user doesn't require synchronous operation
> > for any space then no changes to the WAL generation and replication will
> > appear.
> 
> 1) It's unclear what happens here if async tx follows a sync tx.
>    Does it wait for the sync tx? This reduces availability for

Definitely yes, unless we keep the 'dirty read' as it is at the moment
in memtx. This is the essence of the design, and it is temporary until 
the MVCC similar to the vinyl machinery appears. I intentionally didn't
include this big task into this RFC. 

It will provide similar capabilities, although it will keep only
dependent transactions in the undo log. Also, it looks like it will fit
well into the machinery of this RFC. 

>    async txs - so it's hardly acceptable. Besides, with
>    group=local spaces, one can quickly run out of memory for undo.
>    
> 
> Then it should be allowed to proceed and commit.
> 
> Then mixing sync and async tables in a single transaction
> shouldn't be allowed.
> 
> Imagine t1 is sync and t2 is async. tx1 changes t1 and t2, tx2
> changes t2. tx1 is not confirmed and must be rolled back. But it can
> not revert changes of tx2.
> 
> The spec should clarify that.
> 
> 2) First candidates to "sync" spaces are system spaces, especially
>    _schema (to fix box.once()) and _cluster (to fix parallel join
>    of multiple replicas).
> 
> I can't imagine it's possible to make system spaces synchronous
> with an external coordinator - the coordinator may not be
> available during box.cfg{}.

May not be - means no coordination, means the server can't start.
Again, we're not trying to elaborate the self-driven cluster at this
moment, we rely on external coonrdination.
> 
> 3) One can quickly run out of memory for undo. Any sync
>    transaction should be capped with a timeout to avoid OOMs. I
>    don't know how many times I should repeat it. The only good
>    solution for load control is in-memory WAL, which will allow to
>    rollback all transactions as soon as network partitioning is
>    detected.

How in-memry WAL can help save on _undo_ memory? 
To rollback whatever amount of transactions one need to store the undo. 

> 
> -- 
> Konstantin Osipov, Moscow, Russia

  reply	other threads:[~2020-05-12 16:40 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 21:08 Sergey Ostanevich
2020-04-07 13:02 ` Aleksandr Lyapunov
2020-04-08  9:18   ` Sergey Ostanevich
2020-04-08 14:05     ` Konstantin Osipov
2020-04-08 15:06       ` Sergey Ostanevich
2020-04-14 12:58 ` Sergey Bronnikov
2020-04-14 14:43   ` Sergey Ostanevich
2020-04-15 11:09     ` sergos
2020-04-15 14:50       ` sergos
2020-04-16  7:13         ` Aleksandr Lyapunov
2020-04-17 10:10         ` Konstantin Osipov
2020-04-17 13:45           ` Sergey Ostanevich
2020-04-20 11:20         ` Serge Petrenko
2020-04-20 23:32 ` Vladislav Shpilevoy
2020-04-21 10:49   ` Sergey Ostanevich
2020-04-21 22:17     ` Vladislav Shpilevoy
2020-04-22 16:50       ` Sergey Ostanevich
2020-04-22 20:28         ` Vladislav Shpilevoy
2020-04-23  6:58       ` Konstantin Osipov
2020-04-23  9:14         ` Konstantin Osipov
2020-04-23 11:27           ` Sergey Ostanevich
2020-04-23 11:43             ` Konstantin Osipov
2020-04-23 15:11               ` Sergey Ostanevich
2020-04-23 20:39                 ` Konstantin Osipov
2020-04-23 21:38 ` Vladislav Shpilevoy
2020-04-23 22:28   ` Konstantin Osipov
2020-04-30 14:50   ` Sergey Ostanevich
2020-05-06  8:52     ` Konstantin Osipov
2020-05-06 16:39       ` Sergey Ostanevich
2020-05-06 18:44         ` Konstantin Osipov
2020-05-12 15:55           ` Sergey Ostanevich
2020-05-12 16:42             ` Konstantin Osipov
2020-05-13 21:39             ` Vladislav Shpilevoy
2020-05-13 23:54               ` Konstantin Osipov
2020-05-14 20:38               ` Sergey Ostanevich
2020-05-20 20:59                 ` Sergey Ostanevich
2020-05-25 23:41                   ` Vladislav Shpilevoy
2020-05-27 21:17                     ` Sergey Ostanevich
2020-06-09 16:19                       ` Sergey Ostanevich
2020-06-11 15:17                         ` Vladislav Shpilevoy
2020-06-12 20:31                           ` Sergey Ostanevich
2020-05-13 21:36         ` Vladislav Shpilevoy
2020-05-13 23:45           ` Konstantin Osipov
2020-05-06 18:55     ` Konstantin Osipov
2020-05-06 19:10       ` Konstantin Osipov
2020-05-12 16:03         ` Sergey Ostanevich
2020-05-13 21:42       ` Vladislav Shpilevoy
2020-05-14  0:05         ` Konstantin Osipov
2020-05-07 23:01     ` Konstantin Osipov
2020-05-12 16:40       ` Sergey Ostanevich [this message]
2020-05-12 17:47         ` Konstantin Osipov
2020-05-13 21:34           ` Vladislav Shpilevoy
2020-05-13 23:31             ` Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200512164048.GM112@tarantool.org \
    --to=sergos@tarantool.org \
    --cc=kostja.osipov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox