From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 9EE4B469710 for ; Thu, 14 May 2020 02:32:02 +0300 (MSK) Received: by mail-lj1-f193.google.com with SMTP id o14so1481472ljp.4 for ; Wed, 13 May 2020 16:32:02 -0700 (PDT) Date: Thu, 14 May 2020 02:31:59 +0300 From: Konstantin Osipov Message-ID: <20200513233159.GA5698@atlas> References: <20200403210836.GB18283@tarantool.org> <20200430145033.GF112@tarantool.org> <20200507230112.GB14285@atlas> <20200512164048.GM112@tarantool.org> <20200512174741.GC2049@atlas> <29b2a0df-fe3e-332e-1d33-e9ee37353383@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <29b2a0df-fe3e-332e-1d33-e9ee37353383@tarantool.org> Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org * Vladislav Shpilevoy [20/05/14 00:37]: > >>> 3) One can quickly run out of memory for undo. Any sync > >>> transaction should be capped with a timeout to avoid OOMs. I > >>> don't know how many times I should repeat it. The only good > >>> solution for load control is in-memory WAL, which will allow to > >>> rollback all transactions as soon as network partitioning is > >>> detected. > >> > >> How in-memry WAL can help save on _undo_ memory? > >> To rollback whatever amount of transactions one need to store the undo. > > > > I wrote earlier that it works as a natural failure detector and > > throttling mechanism. If > > there is no quorum, we can see it immediately by looking at the > > number of active subscribers of the in-memory WAL, so do not > > accumulate undo. > > Here we go again ... > > Talking of throttling. Without in-memory WAL no need for throttling. All is > 'slow' by design already, as you think. What is the limit for transactions in txn_limbo list? How does this limit work? What about the fibers, which are pinned as long as the transaction is not committed? > > Talking of failure detection - what??? I don't get it. This is something new. > With in-memory relay or without you anyway can see if there is a quorum. How do you "see" it? You write to the WAL and wait for acks. You could add a wait timeout, and assume there is no quorum if there are no acks within the timeout. This is not the best strategy, but there is no other. The spec doesn't say even that, it simply says that somehow lack of quorum is detected, but how it is detected is not clear. With in-memory WAL you can afford to wait longer if you have space in the ring buffer, and you know immediately if you shouldn't wait because you see that the ring buffer is full and the majority of subscribers are behind the start of the buffer. > This is a matter of API of replication and transaction modules, and their > interaction with each other, solved by txn_limbo in my branch. How is it "solved"? > But still, I don't see how knowing number of subscribers helps with the > quorum. Subscriber presence does not add to quorums by itself. Anyway every > transaction needs to be replicated before you can say that its quorum got > +1 replica ack. It helps to see quickly absence of the quorum, not presence of it. -- Konstantin Osipov, Moscow, Russia