Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Sergey Ostanevich <sergos@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication
Date: Thu, 11 Jun 2020 17:17:29 +0200	[thread overview]
Message-ID: <6678e2a7-9f95-ce3d-1c2f-1eaf20589005@tarantool.org> (raw)
In-Reply-To: <20200609161929.GF50@tarantool.org>

Hi! Thanks for the updates!

> ### Connection liveness
> 
> There is a timeout-based mechanism in Tarantool that controls the
> asynchronous replication, which uses the following config:
> ```
> * replication_connect_timeout  = 4
> * replication_sync_lag         = 10
> * replication_sync_timeout     = 300
> * replication_timeout          = 1
> ```
> For backward compatibility and to differentiate the async replication
> we should augment the configuration with the following:
> ```
> * synchro_replication_heartbeat = 4

Heartbeats are already being sent. I don't see any sense in adding a
second heartbeat option.

> * synchro_replication_quorum_timeout = 4

Since this is a replication option, it should start from replication_
prefix.

> ```
> Leader should send a heartbeat every synchro_replication_heartbeat if
> there were no messages sent. Replicas should respond to the heartbeat
> just the same way as they do it now. As soon as Leader has no response
> for another heartbeat interval, it should consider the replica is lost.

All of that is already done in the regular heartbeats, not related nor
bound to any synchronous activities. Just like failure detection should be.

> As soon as leader appears in a situation it has not enough replicas
> to achieve quorum, it should stop accepting write requests. There's an
> option for leader to rollback to the latest transaction that has quorum:
> leader issues a 'rollback' message referring to the [LEADER_ID, LSN]
> where LSN is of the first transaction in the leader's undo log.

What is that option?

> The rollback message replicated to the available cluster will put it in a
> consistent state. After that configuration of the cluster can be
> updated to a new available quorum and leader can be switched back to
> write mode.
> 
> During the quorum collection it can happen that some of replicas become
> unavailable due to some reason, so leader should wait at most for
> synchro_replication_quorum_timeout after which it issues a Rollback
> pointing to the oldest TXN in the waiting list.

  reply	other threads:[~2020-06-11 15:17 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 21:08 Sergey Ostanevich
2020-04-07 13:02 ` Aleksandr Lyapunov
2020-04-08  9:18   ` Sergey Ostanevich
2020-04-08 14:05     ` Konstantin Osipov
2020-04-08 15:06       ` Sergey Ostanevich
2020-04-14 12:58 ` Sergey Bronnikov
2020-04-14 14:43   ` Sergey Ostanevich
2020-04-15 11:09     ` sergos
2020-04-15 14:50       ` sergos
2020-04-16  7:13         ` Aleksandr Lyapunov
2020-04-17 10:10         ` Konstantin Osipov
2020-04-17 13:45           ` Sergey Ostanevich
2020-04-20 11:20         ` Serge Petrenko
2020-04-20 23:32 ` Vladislav Shpilevoy
2020-04-21 10:49   ` Sergey Ostanevich
2020-04-21 22:17     ` Vladislav Shpilevoy
2020-04-22 16:50       ` Sergey Ostanevich
2020-04-22 20:28         ` Vladislav Shpilevoy
2020-04-23  6:58       ` Konstantin Osipov
2020-04-23  9:14         ` Konstantin Osipov
2020-04-23 11:27           ` Sergey Ostanevich
2020-04-23 11:43             ` Konstantin Osipov
2020-04-23 15:11               ` Sergey Ostanevich
2020-04-23 20:39                 ` Konstantin Osipov
2020-04-23 21:38 ` Vladislav Shpilevoy
2020-04-23 22:28   ` Konstantin Osipov
2020-04-30 14:50   ` Sergey Ostanevich
2020-05-06  8:52     ` Konstantin Osipov
2020-05-06 16:39       ` Sergey Ostanevich
2020-05-06 18:44         ` Konstantin Osipov
2020-05-12 15:55           ` Sergey Ostanevich
2020-05-12 16:42             ` Konstantin Osipov
2020-05-13 21:39             ` Vladislav Shpilevoy
2020-05-13 23:54               ` Konstantin Osipov
2020-05-14 20:38               ` Sergey Ostanevich
2020-05-20 20:59                 ` Sergey Ostanevich
2020-05-25 23:41                   ` Vladislav Shpilevoy
2020-05-27 21:17                     ` Sergey Ostanevich
2020-06-09 16:19                       ` Sergey Ostanevich
2020-06-11 15:17                         ` Vladislav Shpilevoy [this message]
2020-06-12 20:31                           ` Sergey Ostanevich
2020-05-13 21:36         ` Vladislav Shpilevoy
2020-05-13 23:45           ` Konstantin Osipov
2020-05-06 18:55     ` Konstantin Osipov
2020-05-06 19:10       ` Konstantin Osipov
2020-05-12 16:03         ` Sergey Ostanevich
2020-05-13 21:42       ` Vladislav Shpilevoy
2020-05-14  0:05         ` Konstantin Osipov
2020-05-07 23:01     ` Konstantin Osipov
2020-05-12 16:40       ` Sergey Ostanevich
2020-05-12 17:47         ` Konstantin Osipov
2020-05-13 21:34           ` Vladislav Shpilevoy
2020-05-13 23:31             ` Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6678e2a7-9f95-ce3d-1c2f-1eaf20589005@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=sergos@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox