From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 31 Jan 2019 17:54:54 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] [PATCH v2 1/2] Journal transaction boundaries Message-ID: <20190131145454.hhuwbgff2hznsrkx@esperanza> References: <3544285.z36fh00o7k@home.lan> <20190131081951.e5bijlb7wcqzcqiv@esperanza> <5221637.HvkEVp0ifG@home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5221637.HvkEVp0ifG@home.lan> To: =?utf-8?B?0JPQtdC+0YDQs9C40Lkg0JrQuNGA0LjRh9C10L3QutC+?= Cc: tarantool-patches@freelists.org List-ID: On Thu, Jan 31, 2019 at 05:25:24PM +0300, Георгий Кириченко wrote: > On Thursday, January 31, 2019 11:19:51 AM MSK Vladimir Davydov wrote: > > On Thu, Jan 31, 2019 at 10:34:43AM +0300, Георгий Кириченко wrote: > > > On Tuesday, January 29, 2019 2:00:10 PM MSK Vladimir Davydov wrote: > > > > On Tue, Jan 29, 2019 at 01:09:50PM +0300, Георгий Кириченко wrote: > > > > > On Monday, January 28, 2019 3:58:59 PM MSK Vladimir Davydov wrote: > > > > > > On Tue, Jan 22, 2019 at 01:57:36PM +0300, Georgy Kirichenko wrote: > > > > > > > diff --git a/src/box/iproto_constants.h > > > > > > > b/src/box/iproto_constants.h > > > > > > > index 728514297..d01cdf840 100644 > > > > > > > --- a/src/box/iproto_constants.h > > > > > > > +++ b/src/box/iproto_constants.h > > > > > > > @@ -60,6 +60,9 @@ enum iproto_key { > > > > > > > > > > > > > > IPROTO_SCHEMA_VERSION = 0x05, > > > > > > > IPROTO_SERVER_VERSION = 0x06, > > > > > > > IPROTO_GROUP_ID = 0x07, > > > > > > > > > > > > > > + IPROTO_TXN_ID = 0x08, > > > > > > > + IPROTO_TXN_REPLICA_ID = 0x09, > > > > > > > > > > > > Do we really need to introduce both TXN_ID and TXN_REPLICA_ID? I > > > > > > would > > > > > > expect TXN_ID to be enough - we could use REPLICA_ID to make sure > > > > > > that > > > > > > transaction identifiers from different instances don't overlap, > > > > > > because > > > > > > there couldn't be a "multi-instance" transaction, could there? > > > > > > txn_replica_id might be differ from replica_id in case when transaction > > > would be finished on other node, e.g. after recovery. > > > > I'm kinda out of scope here. I think you need to write an RFC or > > something explaining all the design decisions you made when you > > introduced transaction boundaries. Or we should get together with > > Kostja and discuss it. Or both. Neither the commit message nor > > comments shed the light on the big picture. > > > > > > > > > + IPROTO_TXN_LAST = 0x0a, > > > > > > > > > > > > I think we should instead introduce BEGIN and COMMIT commands, because: > > > > > I completely do not like any auto-commit logic in a xlog file. You > > > > > suggestion breaks backward compatibility because previous logs do not > > > > > have any BEGIN/ > > > > > > > > It wouldn't break backward compatibility. It might break forward > > > > compatibility, which is fine by me (we do it all the time). > > > > > > Suggested xrow encoding/decoding rules means that any xrow without txn_id, > > > txn_replica_id, txn_last should be processed as a single statement > > > transaction as it was before. > > > If we would require explicit begin/commit then previous logs turns into an > > > invalid stream without autocommit semantic. > > > > Yeah, we would have to add autocommit. > I am disagreed with autocommit. There are my key points: > 1. We could not interleave autocommit transactions with others, or we would > have to have special mark for this transaction but this is no autocommit after > all. Do we need to support interleaving transactions in xlog? I'm not sure. May be, for sync replication we do. Anyway, we could add BEGIN/COMMIT *in addition* to TXN_ID stored in each row. They would serve as a header and footer. This would give us a place holder to store transaction-wide information if we ever need it. Besides, it would make the stream more robust: if we loose BEGIN, we will detect it while if we loose the first transaction row in your case we won't. Yeah, explicit BEGIN/COMMIT will add a few bytes to each transaction, but IMO it's not critical. Let's discuss it f2f when we can, because we seem to repeat ourselves without moving forward. > 2. Applier would not able to check transaction boundaries. > 3. There is no difference between an autocommit transaction and a single-row > transaction. > 4. Autocommit is about interface behavior but not about log format. > > Also let me explain xrow encoding rules: > If we have only one row in transaction, then we do not encode txn_id and > txn_last. So single-row transaction (or autocommit if you wish) does not > change its' representation. And then while decoding we set txn_id = lsn and > txn_last if txn_id was not set. In that case each row from previous xlog acts > as one single-row transaction. > For multi-row transaction we have only penalty in transaction identifiers for > each row and one commit flag for the last. > > > > > > > > @@ -917,6 +918,39 @@ wal_assign_lsn(struct vclock *vclock, struct > > > > > > > xrow_header **row,> > > > > > > > > > > > > > > vclock_follow_xrow(vclock, *row); > > > > > > > > > > > > > > } > > > > > > > > > > > > > > } > > > > > > > > > > > > > > + if ((*begin)->replica_id != instance_id) { > > > > > > > + /* > > > > > > > + * Move all local changes to the end of rows array and > > > > > > > + * a fake local transaction (like an autonomous transaction) > > > > > > > + * because we could not replicate the transaction back. > > > > > > > + */ > > > > > > > + struct xrow_header **row = end - 1; > > > > > > > + while (row >= begin) { > > > > > > > + if (row[0]->replica_id != instance_id) { > > > > > > > + --row; > > > > > > > + continue; > > > > > > > + } > > > > > > > + /* Local row, move it back. */ > > > > > > > + struct xrow_header **local_row = row; > > > > > > > + while (local_row < end - 1 && > > > > > > > + local_row[1]->replica_id != instance_id) { > > > > > > > + struct xrow_header *tmp = local_row[0]; > > > > > > > + local_row[0] = local_row[1]; > > > > > > > + local_row[1] = tmp; > > > > > > > > > > > > There's SWAP for this. > > > > > > > > > > > > > + } > > > > > > > + --row; > > > > > > > + } > > > > > > > + while (begin < end && begin[0]->replica_id != instance_id) > > > > > > > + ++begin; > > > > > > > + } > > > > > > > > > > > > I don't understand why we need to move rows generated locally (by > > > > > > an on_replace trigger I surmise) to the end of a transaction. We > > > > > > have TXN_ID attached to each row so we could leave the transactions > > > > > > interleaved, couldn't we? > > > > > > > > > > You are right, but in that case applier should track a lot of > > > > > transaction > > > > > simultaneously. Also it complicates recovery too. I hope it will be > > > > > fixed > > > > > while parallel applier implementing. > > > > > > > > May be, we should implement parallel applier in the scope of this issue > > > > then? Anyway, without it, sync replication won't scale with the number > > > > of parallel transactions. > > > > > > I do not think so. Parallel applier depends on that feature, but it is > > > completely different issue. > > > > But as I said, sync replication won't scale without it. > > > > > Also parallel applier is ignorant on a xlog format. > > > > Depends on how you implement it. For example, MySQL marks all > > transactions that were committed in parallel on the master so that > > they can be committed in parallel on replicas. > In case of vinyl it is not true. In case of vinyl we can mark transactions that overlapped in time on the master. In fact, that's what MySQL actually does. I guess I misleaded you by saying "committed in parallel". I meant "executed in parallel".