Tarantool development patches archive
 help / color / mirror / Atom feed
From: Konstantin Osipov <kostja.osipov@gmail.com>
To: Sergey Ostanevich <sergos@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org,
	Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication
Date: Thu, 23 Apr 2020 14:43:25 +0300	[thread overview]
Message-ID: <20200423114325.GA19129@atlas> (raw)
In-Reply-To: <20200423112702.GC112@tarantool.org>

* Sergey Ostanevich <sergos@tarantool.org> [20/04/23 14:29]:
> Hi!
> 
> Thanks for review!
> 
> On 23 апр 12:14, Konstantin Osipov wrote:
> > * Konstantin Osipov <kostja.osipov@gmail.com> [20/04/23 09:58]:
> > > > > To my understanding - it's up to user. I was considering a cluster that
> > > > > has no WAL at all - relying on sychro replication and sufficient number
> > > > > of replicas. Everyone who I asked about it told me I'm nuts. To my great
> > > > > surprise Alexander Lyapunov brought exactly the same idea to discuss. 
> > > > 
> > > > I didn't see an RFC on that, and this can become easily possible, when
> > > > in-memory relay is implemented. If it is implemented in a clean way. We
> > > > just can turn off the disk backoff, and it will work from memory-only.
> > > 
> > > Sync replication must work from in-memory relay only. It works as
> > > a natural failure detector: a replica which is slow or unavailable
> > > is first removed from the subscribers of in-memory relay, and only 
> > > then (possibly much much later) is marked as down.
> > > 
> > > By looking at the in-memory relay you have a clear idea what peers
> > > are available and can abort a transaction if a cluster is in the
> > > downgraded state right away. You never wait for impossible events. 
> > > 
> > > If you do have to wait, and say your wait timeout is 1 second, you
> > > quickly run out of any fibers in the fiber pool for any work,
> > > because all of them will be waiting on the sync transactions they
> > > picked up from iproto to finish. The system will loose its
> > > throttling capability. 
> > 
> There's no need to explain it to customer: sync replication is not
> expected to be as fast as pure in-memory. By no means. We have network
> communication, disk operation, multiple entities quorum - all of these
> can't be as fast. No need to try cramp more than network can push
> through, obvoiusly.

This expected performance overhead is not a grant to run out of
memory or available fibers on a node failure or network partitioning.

> The quality one buys for this price: consistency of data in multiple
> instances distributed across different locations. 

The spec should demonstrate the consistency is guaranteed: right
now it can easily be violated during a leader change, and this is
left out of scope of the spec.

My take is that any implementation which is not close enough to a
TLA+ proven spec is not trustworthy, so I would not claim myself
or trust any one elses claims that it is consistent. At best this
RFC could achieve durability, by ensuring that no transaction is
committed unless it is delivered to a majority of replicas.
Consistency requires implementing RAFT spec in full and showing
that leader changes preserve the write ahead log linearizability.

> > The other issue is that if your replicas are alive but
> > slow/lagging behind, you can't let too many undo records to
> > pile up unacknowledged in tx thread.
> > The in-memory relay solves this nicely too, because it kicks out
> > replicas from memory to file mode if they are unable to keep up
> > with the speed of change.
> > 
> That is the same problem - resources of leader, so natural limit for
> throughput. I bet Tarantool faces similar limitations even now,
> although different ones. 
> 
> The in-memory relay supposed to keep the same interface, so we expect to
> hop easily to this new shiny express as soon as it appears. This will be
> an optimization and we're trying to implement something and then speed
> it up.

It is pretty clear that the implementation will be different. 

-- 
Konstantin Osipov, Moscow, Russia

  reply	other threads:[~2020-04-23 11:43 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 21:08 Sergey Ostanevich
2020-04-07 13:02 ` Aleksandr Lyapunov
2020-04-08  9:18   ` Sergey Ostanevich
2020-04-08 14:05     ` Konstantin Osipov
2020-04-08 15:06       ` Sergey Ostanevich
2020-04-14 12:58 ` Sergey Bronnikov
2020-04-14 14:43   ` Sergey Ostanevich
2020-04-15 11:09     ` sergos
2020-04-15 14:50       ` sergos
2020-04-16  7:13         ` Aleksandr Lyapunov
2020-04-17 10:10         ` Konstantin Osipov
2020-04-17 13:45           ` Sergey Ostanevich
2020-04-20 11:20         ` Serge Petrenko
2020-04-20 23:32 ` Vladislav Shpilevoy
2020-04-21 10:49   ` Sergey Ostanevich
2020-04-21 22:17     ` Vladislav Shpilevoy
2020-04-22 16:50       ` Sergey Ostanevich
2020-04-22 20:28         ` Vladislav Shpilevoy
2020-04-23  6:58       ` Konstantin Osipov
2020-04-23  9:14         ` Konstantin Osipov
2020-04-23 11:27           ` Sergey Ostanevich
2020-04-23 11:43             ` Konstantin Osipov [this message]
2020-04-23 15:11               ` Sergey Ostanevich
2020-04-23 20:39                 ` Konstantin Osipov
2020-04-23 21:38 ` Vladislav Shpilevoy
2020-04-23 22:28   ` Konstantin Osipov
2020-04-30 14:50   ` Sergey Ostanevich
2020-05-06  8:52     ` Konstantin Osipov
2020-05-06 16:39       ` Sergey Ostanevich
2020-05-06 18:44         ` Konstantin Osipov
2020-05-12 15:55           ` Sergey Ostanevich
2020-05-12 16:42             ` Konstantin Osipov
2020-05-13 21:39             ` Vladislav Shpilevoy
2020-05-13 23:54               ` Konstantin Osipov
2020-05-14 20:38               ` Sergey Ostanevich
2020-05-20 20:59                 ` Sergey Ostanevich
2020-05-25 23:41                   ` Vladislav Shpilevoy
2020-05-27 21:17                     ` Sergey Ostanevich
2020-06-09 16:19                       ` Sergey Ostanevich
2020-06-11 15:17                         ` Vladislav Shpilevoy
2020-06-12 20:31                           ` Sergey Ostanevich
2020-05-13 21:36         ` Vladislav Shpilevoy
2020-05-13 23:45           ` Konstantin Osipov
2020-05-06 18:55     ` Konstantin Osipov
2020-05-06 19:10       ` Konstantin Osipov
2020-05-12 16:03         ` Sergey Ostanevich
2020-05-13 21:42       ` Vladislav Shpilevoy
2020-05-14  0:05         ` Konstantin Osipov
2020-05-07 23:01     ` Konstantin Osipov
2020-05-12 16:40       ` Sergey Ostanevich
2020-05-12 17:47         ` Konstantin Osipov
2020-05-13 21:34           ` Vladislav Shpilevoy
2020-05-13 23:31             ` Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200423114325.GA19129@atlas \
    --to=kostja.osipov@gmail.com \
    --cc=sergos@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox