From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp3.mail.ru (smtp3.mail.ru [94.100.179.58]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 85A2C4696C3 for ; Thu, 23 Apr 2020 14:27:03 +0300 (MSK) Date: Thu, 23 Apr 2020 14:27:02 +0300 From: Sergey Ostanevich Message-ID: <20200423112702.GC112@tarantool.org> References: <20200403210836.GB18283@tarantool.org> <20200421104918.GA112@tarantool.org> <20200423065809.GA4528@atlas> <20200423091436.GA14576@atlas> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200423091436.GA14576@atlas> Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Konstantin Osipov , Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org Hi! Thanks for review! On 23 апр 12:14, Konstantin Osipov wrote: > * Konstantin Osipov [20/04/23 09:58]: > > > > To my understanding - it's up to user. I was considering a cluster that > > > > has no WAL at all - relying on sychro replication and sufficient number > > > > of replicas. Everyone who I asked about it told me I'm nuts. To my great > > > > surprise Alexander Lyapunov brought exactly the same idea to discuss. > > > > > > I didn't see an RFC on that, and this can become easily possible, when > > > in-memory relay is implemented. If it is implemented in a clean way. We > > > just can turn off the disk backoff, and it will work from memory-only. > > > > Sync replication must work from in-memory relay only. It works as > > a natural failure detector: a replica which is slow or unavailable > > is first removed from the subscribers of in-memory relay, and only > > then (possibly much much later) is marked as down. > > > > By looking at the in-memory relay you have a clear idea what peers > > are available and can abort a transaction if a cluster is in the > > downgraded state right away. You never wait for impossible events. > > > > If you do have to wait, and say your wait timeout is 1 second, you > > quickly run out of any fibers in the fiber pool for any work, > > because all of them will be waiting on the sync transactions they > > picked up from iproto to finish. The system will loose its > > throttling capability. > There's no need to explain it to customer: sync replication is not expected to be as fast as pure in-memory. By no means. We have network communication, disk operation, multiple entities quorum - all of these can't be as fast. No need to try cramp more than network can push through, obvoiusly. The quality one buys for this price: consistency of data in multiple instances distributed across different locations. > The other issue is that if your replicas are alive but > slow/lagging behind, you can't let too many undo records to > pile up unacknowledged in tx thread. > The in-memory relay solves this nicely too, because it kicks out > replicas from memory to file mode if they are unable to keep up > with the speed of change. > That is the same problem - resources of leader, so natural limit for throughput. I bet Tarantool faces similar limitations even now, although different ones. The in-memory relay supposed to keep the same interface, so we expect to hop easily to this new shiny express as soon as it appears. This will be an optimization and we're trying to implement something and then speed it up.