From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp45.i.mail.ru (smtp45.i.mail.ru [94.100.177.105]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 4FE354765E0 for ; Thu, 24 Dec 2020 19:13:46 +0300 (MSK) References: From: Serge Petrenko Message-ID: <81233187-ea05-1efc-b6f1-36b1833c7acb@tarantool.org> Date: Thu, 24 Dec 2020 19:13:45 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-GB Subject: Re: [Tarantool-patches] [PATCH v2 6/6] txn_limbo: ignore CONFIRM/ROLLBACK for a foreign master List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org 23.12.2020 20:28, Vladislav Shpilevoy пишет: > Thanks for the patch! > > On 23.12.2020 12:59, Serge Petrenko via Tarantool-patches wrote: >> We designed limbo so that it errors on receiving a CONFIRM or ROLLBACK >> for other instance's data. Actually, this error is pointless, and even >> harmful. Here's why: >> >> Imagine you have 3 instances, 1, 2 and 3. >> First 1 writes some synchronous transactions, but dies before writing CONFIRM. >> >> Now 2 has to write CONFIRM instead of 1 to take limbo ownership. >> From now on 2 is the limbo owner and in case of high enough load it constantly >> has some data in the limbo. >> >> Once 1 restarts, it first recovers its xlogs, and fills its limbo with >> its own unconfirmed transactions from the previous run. Now replication >> between 1, 2 and 3 is started and the first thing 1 sees is that 2 and 3 >> ack its old transactions. So 1 writes CONFIRM for its own transactions >> even before the same CONFIRM written by 2 reaches it. >> Once the CONFIRM written by 1 is replicated to 2 and 3 they error and >> stop replication, since their limbo contains entries from 2, not from 1. >> Actually, there's no need to error, since it's just a really old CONFIRM >> which's already processed by both 2 and 3.> >> So, ignore CONFIRM/ROLLBACK when it references a wrong limbo owner. >> >> The issue was discovered with test replication/election_qsync_stress. > The comment is good. Thanks! -- Serge Petrenko