From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp34.i.mail.ru (smtp34.i.mail.ru [94.100.177.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 9281D469710 for ; Mon, 1 Jun 2020 19:02:26 +0300 (MSK) Date: Mon, 1 Jun 2020 19:02:24 +0300 From: Sergey Ostanevich Message-ID: <20200601160224.GB50@tarantool.org> References: <266e9768-27ea-e924-e562-846e74df7760@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Hi! On 01 июн 15:40, Vladislav Shpilevoy wrote: > On 29/05/2020 13:42, Serge Petrenko wrote: > > > > 29.05.2020 01:54, Vladislav Shpilevoy пишет: > >> Thanks   for   the   patch, > >> But   I   have  a  comment! > >> It   is   a   nice  crutch, > >> Yet we need one more moment. > >> > >> The patch basically sacrifices transaction rows order > >> and WAL correctness for the sake of replication. It > >> does not look right. Why can't we leave WAL as is, and > >> tweak all these things in relay? It looks really wrong > >> to change statements order. Especially taking into > >> account this is needed *only* for replication. For example, > >> consider FKs. > >> > >> A local space has a FOREIGN KEY reference to a global > >> space. To make it work, we need to insert into the global > >> space first, and then into the local space. When you change > >> the order, the local insert goes first, and violates the > >> foreign key. So if we will check FKs on recovery (when we > >> will have FKs in box), this patch will break them. > >> > >> Alternative to relay - append a dummy NOP statement in the > >> end of the transaction, which would be global. But this is > >> also a crutch. I think the TSNs figuring out should be done > >> in relay. It could keep track of the current transaction, > >> change TSNs and is_commit when necessary. > > > > Hi! Thanks for the review! > > > > I understand this is a crutch, but it's the best solution I could come > > up with.  Appending dummy  NOPs will increase instance LSN by one, > > which  also looks  ugly. The correct solution is, indeed, to collect  a tx > > in relay and mangle with it in any means  we need before sending, however, > > I faced some problems with this approach. See more in v1 of this patchset > > (letter [PATCH 2/2] replication: make relay send txs in batches). > > > > By the way, your txn_limbo implementation assigns the tx an lsn of the last > > tx row, which doesn't work in case the last row is a local one. So, looks like > > we either need to reorder the rows or insert a NOP at the tx end. Or just > > assign the tx an lsn of the last global row. > > > > Still, it's up to you which solution you find a better one, and I'll > > implement it. > > I guess the current solution is ok then. Feel free to add > Reviewed-by: Vladislav Shpilevoy . First version > looks better, but seems it will take too much time to rework xlog > streaming API. > > Although did you try? I don't know xlog code well, but perhaps you could > just forbid to reuse ibuf for sequential rows until explicit reset() or > something. > > I also have a thought about is_commit necessity. Why do we need it? Why isn't > tsn enough? I saw in xrow code (xrow_header_encode()) that there are 2 > combinations: > > - No tsn, and no is_commit for single statement transactions. Is_commit > is assumed implicitly; > > - tsn is encoded, and the last statement has is_commit. > > But in fact the is_commit flag always can be determined implicitly. Consider > all the possible cases: > > - row1: tsn1, row2: tsn2. Then all rows [..., row1] belong to one > transaction, all rows [row2, ...] belong to a next transaction; > > - row1: no tsn, row2: tsn2. Then row1 is first single-statement > transaction, [row2, ...] - next transaction. > > - row1: tsn1, row2: no tsn. Then the same as previous. [..., row1] - > first transaction, row2 - second single-statement transaction. > > - row1: no tsn, row2: no tsn. Two single statement transactions. > > It means, transaction border can be determined by tsn difference > between two sequential rows. This is allowed because transactions > are never mixed. Tsn difference is border. And since is_rollback > does not exist (it will in sync replication, but as a separate entry, > for multiple transactions), and is_commit *already* is assumed > implicitly for single statement transactions, why not to calculate it > implicitly for all transactions? Shall we need a readahead then? Consider you have a bactch arrived and you have to guess if the last op in it actually is_commit? What if you have no following ops for a long time? Will you hold the txn until the next one arrived? Sergos