From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kostja.osipov@gmail.com>
Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com
 [209.85.167.68])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id 56CB1469710
 for <tarantool-patches@dev.tarantool.org>;
 Thu, 14 May 2020 02:54:51 +0300 (MSK)
Received: by mail-lf1-f68.google.com with SMTP id r17so1010104lff.9
 for <tarantool-patches@dev.tarantool.org>;
 Wed, 13 May 2020 16:54:51 -0700 (PDT)
Date: Thu, 14 May 2020 02:54:48 +0300
From: Konstantin Osipov <kostja.osipov@gmail.com>
Message-ID: <20200513235448.GC5698@atlas>
References: <20200403210836.GB18283@tarantool.org>
 <c86ef610-f54e-524e-103a-324e7e572d2d@tarantool.org>
 <20200430145033.GF112@tarantool.org> <20200506085249.GA2842@atlas>
 <20200506163901.GH112@tarantool.org> <20200506184445.GB24913@atlas>
 <20200512155508.GJ112@tarantool.org>
 <78713377-806f-8cf6-efe0-5019f3d3e428@tarantool.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <78713377-806f-8cf6-efe0-5019f3d3e428@tarantool.org>
Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [20/05/14 00:42]:
> > Sure yes, if it restarted - then connection lost can't be unnoticed by
> > anyone, be it coordinator or cluster.
> 
> Here comes another problem. Disconnect and restart have nothing to do with
> each other. The coordinator can loose connection without the peer leader
> restart. Just because it is network. Anything can happen. Moreover, while
> the coordinator does not have a connection, the leader can restart multiple
> times.

yes. 

> We can't tell the coordinator rely on connectivity as a restart signal.

Well, we could demand that the leader always demotes itself after
restart. But the spec should be explicit about it and explain how
the election happens in this case, because it still may have the
longest WAL (but with some junk in it, thanks to lost confirms),
so after restart the leader may need to reconcile its wal with the
majority, fetching missing records back.

Once again, RAFT is very explicit about this. By default it
requires that the leader commit log is durable, i.e.
wal_mode=sync. This would kill performance. Implementations exist
which run in wal_mode=write (cassandra is one of them), but they know how to
repair the log at the leader before proceeding with the next
transaction. The reason I brought this up is that it's extremely
tricky, and confusing as hell if the election is external (agree
there should be an API, or better yet, abandon the idea of
external election, just have no election for now at all, assume
the leader never changes, and we only provide durability in
multi-master config), with no consistency guarantees (but eventual
one).

> > How a restart can be unnoticed, if it causes disconnection?
> 
> Disconnection has nothing to do with restart. The coordinator itself may
> restart. Or it may loose connection to the leader temporarily. Or the
> leader may loose it without any restarts.

and yes.

-- 
Konstantin Osipov, Moscow, Russia