From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E6C254765E3 for ; Wed, 23 Dec 2020 20:28:07 +0300 (MSK) References: From: Vladislav Shpilevoy Message-ID: Date: Wed, 23 Dec 2020 18:28:06 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH v2 6/6] txn_limbo: ignore CONFIRM/ROLLBACK for a foreign master List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Serge Petrenko , gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org Thanks for the patch! On 23.12.2020 12:59, Serge Petrenko via Tarantool-patches wrote: > We designed limbo so that it errors on receiving a CONFIRM or ROLLBACK > for other instance's data. Actually, this error is pointless, and even > harmful. Here's why: > > Imagine you have 3 instances, 1, 2 and 3. > First 1 writes some synchronous transactions, but dies before writing CONFIRM. > > Now 2 has to write CONFIRM instead of 1 to take limbo ownership. > From now on 2 is the limbo owner and in case of high enough load it constantly > has some data in the limbo. > > Once 1 restarts, it first recovers its xlogs, and fills its limbo with > its own unconfirmed transactions from the previous run. Now replication > between 1, 2 and 3 is started and the first thing 1 sees is that 2 and 3 > ack its old transactions. So 1 writes CONFIRM for its own transactions > even before the same CONFIRM written by 2 reaches it. > Once the CONFIRM written by 1 is replicated to 2 and 3 they error and > stop replication, since their limbo contains entries from 2, not from 1. > Actually, there's no need to error, since it's just a really old CONFIRM > which's already processed by both 2 and 3.> > So, ignore CONFIRM/ROLLBACK when it references a wrong limbo owner. > > The issue was discovered with test replication/election_qsync_stress. The comment is good.