From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com [209.85.208.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 07470469710 for ; Fri, 8 May 2020 02:01:14 +0300 (MSK) Received: by mail-lj1-f194.google.com with SMTP id u15so8180873ljd.3 for ; Thu, 07 May 2020 16:01:14 -0700 (PDT) Date: Fri, 8 May 2020 02:01:12 +0300 From: Konstantin Osipov Message-ID: <20200507230112.GB14285@atlas> References: <20200403210836.GB18283@tarantool.org> <20200430145033.GF112@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200430145033.GF112@tarantool.org> Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Ostanevich Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy > ### Synchronous replication enabling. > > Synchronous operation can be required for a set of spaces in the data > scheme. That means only transactions that contain data modification for > these spaces should require quorum. Such transactions named synchronous. > As soon as last operation of synchronous transaction appeared in leader's > WAL, it will cause all following transactions - no matter if they are > synchronous or not - wait for the quorum. In case quorum is not achieved > the 'rollback' operation will cause rollback of all transactions after > the synchronous one. It will ensure the consistent state of the data both > on leader and replicas. In case user doesn't require synchronous operation > for any space then no changes to the WAL generation and replication will > appear. 1) It's unclear what happens here if async tx follows a sync tx. Does it wait for the sync tx? This reduces availability for async txs - so it's hardly acceptable. Besides, with group=local spaces, one can quickly run out of memory for undo. Then it should be allowed to proceed and commit. Then mixing sync and async tables in a single transaction shouldn't be allowed. Imagine t1 is sync and t2 is async. tx1 changes t1 and t2, tx2 changes t2. tx1 is not confirmed and must be rolled back. But it can not revert changes of tx2. The spec should clarify that. 2) First candidates to "sync" spaces are system spaces, especially _schema (to fix box.once()) and _cluster (to fix parallel join of multiple replicas). I can't imagine it's possible to make system spaces synchronous with an external coordinator - the coordinator may not be available during box.cfg{}. 3) One can quickly run out of memory for undo. Any sync transaction should be capped with a timeout to avoid OOMs. I don't know how many times I should repeat it. The only good solution for load control is in-memory WAL, which will allow to rollback all transactions as soon as network partitioning is detected. -- Konstantin Osipov, Moscow, Russia