Tarantool development patches archive
 help / color / mirror / Atom feed
From: Konstantin Osipov <kostja.osipov@gmail.com>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication
Date: Fri, 24 Apr 2020 01:28:37 +0300	[thread overview]
Message-ID: <20200423222837.GC22011@atlas> (raw)
In-Reply-To: <c86ef610-f54e-524e-103a-324e7e572d2d@tarantool.org>

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [20/04/24 00:42]:

> It says, that once the quorum is collected, and 'confirm' is written
> to local leader's WAL, it is considered committed and is reported
> to the client as successful.
> 
> On the other hand it is said, that in case of leader change the
> new leader will rollback all not confirmed transactions. That leads
> to the following bug:
> 
> Assume we have 4 instances: i1, i2, i3, i4. Leader is i1. It
> writes a transaction with LSN1. The LSN1 is sent to other nodes,
> they apply it ok, and send acks to the leader. The leader sees
> i2-i4 all applied the transaction (propagated their LSNs to LSN1).
> It writes 'confirm' to its local WAL, reports it to the client as
> success, the client's request is over, it is returned back to
> some remote node, etc. The transaction is officially synchronously
> committed.
> 
> Then the leader's machine dies - disk is dead. The confirm was
> not sent to any of the other nodes. For example, it started having
> problems with network connection to the replicas recently before
> the death. Or it just didn't manage to hand the confirm out.
> 
> >From now on if any of the other nodes i2-i4 becomes a leader, it
> will rollback the officially confirmed transaction, even if it
> has it, and all the other nodes too.
> 
> That basically means, this sync replication gives exactly the same
> guarantees as the async replication - 'confirm' on the leader tells
> nothing about replicas except that they *are able to apply the
> transaction*, but still may not apply it.
> 
> Am I missing something?

This video explains what leader has to do after it's been elected:

https://www.youtube.com/watch?v=YbZ3zDzDnrw

In short, the transactions in leader's wal has to be committed,
not rolled back.

Raft paper has https://raft.github.io/raft.pdf has answers in a
concise single page summary.

Why have this discussion at all, any ambiguity or discrepancy
between this document and raft paper should be treated as a
mistake in this document. Or do you actually think it's possible
to invent a new consensus algorithm this way?

> Note for those who is concerned: this has nothing to do with
> in-memory relay. It has the same problems, which are in the protocol,
> not in the implementation.

No, the issues are distinct:
1) there may be cases where this paper doesn't follow RAFT. It
   should be obvious to everyone, that with the exception to
   external leader election and failure detection it has to if
   correctness is of any concern, so it's simply a matter of
   fixing this doc to match raft.

   As to the leader election, there are two alternatives: either
   spec out in this paper how the external election is interacting
   with the cluster, including finishing up old transactions and
   neutralizing old leaders, or allow multi-master, so forget
   about consistency for now.
2) an implementation based on triggers will be complicated and
   will have performance/stability implications. This is what I
   hope I was able to convey and in this case we can put the
   matter to rest. 
   
-- 
Konstantin Osipov, Moscow, Russia

  reply	other threads:[~2020-04-23 22:28 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 21:08 Sergey Ostanevich
2020-04-07 13:02 ` Aleksandr Lyapunov
2020-04-08  9:18   ` Sergey Ostanevich
2020-04-08 14:05     ` Konstantin Osipov
2020-04-08 15:06       ` Sergey Ostanevich
2020-04-14 12:58 ` Sergey Bronnikov
2020-04-14 14:43   ` Sergey Ostanevich
2020-04-15 11:09     ` sergos
2020-04-15 14:50       ` sergos
2020-04-16  7:13         ` Aleksandr Lyapunov
2020-04-17 10:10         ` Konstantin Osipov
2020-04-17 13:45           ` Sergey Ostanevich
2020-04-20 11:20         ` Serge Petrenko
2020-04-20 23:32 ` Vladislav Shpilevoy
2020-04-21 10:49   ` Sergey Ostanevich
2020-04-21 22:17     ` Vladislav Shpilevoy
2020-04-22 16:50       ` Sergey Ostanevich
2020-04-22 20:28         ` Vladislav Shpilevoy
2020-04-23  6:58       ` Konstantin Osipov
2020-04-23  9:14         ` Konstantin Osipov
2020-04-23 11:27           ` Sergey Ostanevich
2020-04-23 11:43             ` Konstantin Osipov
2020-04-23 15:11               ` Sergey Ostanevich
2020-04-23 20:39                 ` Konstantin Osipov
2020-04-23 21:38 ` Vladislav Shpilevoy
2020-04-23 22:28   ` Konstantin Osipov [this message]
2020-04-30 14:50   ` Sergey Ostanevich
2020-05-06  8:52     ` Konstantin Osipov
2020-05-06 16:39       ` Sergey Ostanevich
2020-05-06 18:44         ` Konstantin Osipov
2020-05-12 15:55           ` Sergey Ostanevich
2020-05-12 16:42             ` Konstantin Osipov
2020-05-13 21:39             ` Vladislav Shpilevoy
2020-05-13 23:54               ` Konstantin Osipov
2020-05-14 20:38               ` Sergey Ostanevich
2020-05-20 20:59                 ` Sergey Ostanevich
2020-05-25 23:41                   ` Vladislav Shpilevoy
2020-05-27 21:17                     ` Sergey Ostanevich
2020-06-09 16:19                       ` Sergey Ostanevich
2020-06-11 15:17                         ` Vladislav Shpilevoy
2020-06-12 20:31                           ` Sergey Ostanevich
2020-05-13 21:36         ` Vladislav Shpilevoy
2020-05-13 23:45           ` Konstantin Osipov
2020-05-06 18:55     ` Konstantin Osipov
2020-05-06 19:10       ` Konstantin Osipov
2020-05-12 16:03         ` Sergey Ostanevich
2020-05-13 21:42       ` Vladislav Shpilevoy
2020-05-14  0:05         ` Konstantin Osipov
2020-05-07 23:01     ` Konstantin Osipov
2020-05-12 16:40       ` Sergey Ostanevich
2020-05-12 17:47         ` Konstantin Osipov
2020-05-13 21:34           ` Vladislav Shpilevoy
2020-05-13 23:31             ` Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200423222837.GC22011@atlas \
    --to=kostja.osipov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox