Tarantool development patches archive
 help / color / mirror / Atom feed
From: Konstantin Osipov <kostja.osipov@gmail.com>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication
Date: Thu, 23 Apr 2020 09:58:09 +0300	[thread overview]
Message-ID: <20200423065809.GA4528@atlas> (raw)
In-Reply-To: <dd4d703e-6918-ccd9-ac5e-76fb54fff0f9@tarantool.org>

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [20/04/22 01:21]:
> > To my understanding - it's up to user. I was considering a cluster that
> > has no WAL at all - relying on sychro replication and sufficient number
> > of replicas. Everyone who I asked about it told me I'm nuts. To my great
> > surprise Alexander Lyapunov brought exactly the same idea to discuss. 
> 
> I didn't see an RFC on that, and this can become easily possible, when
> in-memory relay is implemented. If it is implemented in a clean way. We
> just can turn off the disk backoff, and it will work from memory-only.

Sync replication must work from in-memory relay only. It works as
a natural failure detector: a replica which is slow or unavailable
is first removed from the subscribers of in-memory relay, and only 
then (possibly much much later) is marked as down.

By looking at the in-memory relay you have a clear idea what peers
are available and can abort a transaction if a cluster is in the
downgraded state right away. You never wait for impossible events. 

If you do have to wait, and say your wait timeout is 1 second, you
quickly run out of any fibers in the fiber pool for any work,
because all of them will be waiting on the sync transactions they
picked up from iproto to finish. The system will loose its
throttling capability. 

There are other reasons, too: the protocol will eventually be
quite tricky and the logic has to reside in a single place and not
require inter-thread communication. 
Committing a transaction purely anywhere outside WAL will require 
inter-thread communication, which is costly and should be avoided.

I am surprised I have to explain this again and again - I never
assumed this spec is a precursor for a half-backed implementation,
only as a high-level description of the next steps after in-memory
relay is in.

> > All of these is for one resolution: I would keep it for user to decide.
> > Obviously, to speed up the processing leader can disable wal completely,
> > but to do so we have to re-work the relay to work from memory. Replicas
> > can use WAL in a way user wants: 2 replicas with slow HDD should'n wait
> > for fsync(), while super-fast Intel DCPMM one can enable it. Balancing
> > is up to user.
> 
> Possibility of omitting fsync means that it is possible, that all nodes
> write confirm, which is reported to the client, then the nodes restart,
> and the data is lost. I would say it somewhere.

Worse yet you can elect a leader "based on WAL length" and then it
is no longer the leader, because it lost it long WAL after
restart. fcync() is mandatory during election, in other cases it
shouldn't impact durability in our case.

-- 
Konstantin Osipov, Moscow, Russia

  parent reply	other threads:[~2020-04-23  6:58 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 21:08 Sergey Ostanevich
2020-04-07 13:02 ` Aleksandr Lyapunov
2020-04-08  9:18   ` Sergey Ostanevich
2020-04-08 14:05     ` Konstantin Osipov
2020-04-08 15:06       ` Sergey Ostanevich
2020-04-14 12:58 ` Sergey Bronnikov
2020-04-14 14:43   ` Sergey Ostanevich
2020-04-15 11:09     ` sergos
2020-04-15 14:50       ` sergos
2020-04-16  7:13         ` Aleksandr Lyapunov
2020-04-17 10:10         ` Konstantin Osipov
2020-04-17 13:45           ` Sergey Ostanevich
2020-04-20 11:20         ` Serge Petrenko
2020-04-20 23:32 ` Vladislav Shpilevoy
2020-04-21 10:49   ` Sergey Ostanevich
2020-04-21 22:17     ` Vladislav Shpilevoy
2020-04-22 16:50       ` Sergey Ostanevich
2020-04-22 20:28         ` Vladislav Shpilevoy
2020-04-23  6:58       ` Konstantin Osipov [this message]
2020-04-23  9:14         ` Konstantin Osipov
2020-04-23 11:27           ` Sergey Ostanevich
2020-04-23 11:43             ` Konstantin Osipov
2020-04-23 15:11               ` Sergey Ostanevich
2020-04-23 20:39                 ` Konstantin Osipov
2020-04-23 21:38 ` Vladislav Shpilevoy
2020-04-23 22:28   ` Konstantin Osipov
2020-04-30 14:50   ` Sergey Ostanevich
2020-05-06  8:52     ` Konstantin Osipov
2020-05-06 16:39       ` Sergey Ostanevich
2020-05-06 18:44         ` Konstantin Osipov
2020-05-12 15:55           ` Sergey Ostanevich
2020-05-12 16:42             ` Konstantin Osipov
2020-05-13 21:39             ` Vladislav Shpilevoy
2020-05-13 23:54               ` Konstantin Osipov
2020-05-14 20:38               ` Sergey Ostanevich
2020-05-20 20:59                 ` Sergey Ostanevich
2020-05-25 23:41                   ` Vladislav Shpilevoy
2020-05-27 21:17                     ` Sergey Ostanevich
2020-06-09 16:19                       ` Sergey Ostanevich
2020-06-11 15:17                         ` Vladislav Shpilevoy
2020-06-12 20:31                           ` Sergey Ostanevich
2020-05-13 21:36         ` Vladislav Shpilevoy
2020-05-13 23:45           ` Konstantin Osipov
2020-05-06 18:55     ` Konstantin Osipov
2020-05-06 19:10       ` Konstantin Osipov
2020-05-12 16:03         ` Sergey Ostanevich
2020-05-13 21:42       ` Vladislav Shpilevoy
2020-05-14  0:05         ` Konstantin Osipov
2020-05-07 23:01     ` Konstantin Osipov
2020-05-12 16:40       ` Sergey Ostanevich
2020-05-12 17:47         ` Konstantin Osipov
2020-05-13 21:34           ` Vladislav Shpilevoy
2020-05-13 23:31             ` Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200423065809.GA4528@atlas \
    --to=kostja.osipov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox