From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 29 Jan 2019 14:00:10 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] [PATCH v2 1/2] Journal transaction boundaries Message-ID: <20190129110010.qav5cedodjfs4ju4@esperanza> References: <7b72afb3f1a3479f1239fcbed936cc151b55f23c.1548153546.git.georgy@tarantool.org> <20190128125859.grj7d3euvdsekd5a@esperanza> <3229189.u9recYUF4a@home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3229189.u9recYUF4a@home.lan> To: =?utf-8?B?0JPQtdC+0YDQs9C40Lkg0JrQuNGA0LjRh9C10L3QutC+?= Cc: tarantool-patches@freelists.org List-ID: On Tue, Jan 29, 2019 at 01:09:50PM +0300, Георгий Кириченко wrote: > On Monday, January 28, 2019 3:58:59 PM MSK Vladimir Davydov wrote: > > On Tue, Jan 22, 2019 at 01:57:36PM +0300, Georgy Kirichenko wrote: > > > Append txn_id, txn_replica_id and txn_last to xrow_header structure. > > > txn_replica_id identifies replica where transaction was started and > > > txn_id identifies transaction id on that replica. As transaction id > > > a lsn of the first row in this transaction is used. > > > txn_last set to true if it is the last row in a transaction, so we > > > could commit transaction with last row or use additional NOP requests > > > with txn_last = true ans valid txn_id and txn_replica_id. > > > For replication all local changes moved to xrows array tail to form > > > a separate transaction (like autonomous transaction) because it is not > > > possible to replicate such transaction back to it's creator. > > > > > > As encoding/deconding rules assumed: > > > 1. txn_replica_id is encoded only if it is not equal with replica > > > > > > id. This might have point because of replication trigger > > > > > > 2. txn_id and txn_last are encoded only for multi-row transaction. > > > > > > So if we do not have txn_id in a xstream then this means that it > > > is a single row transaction. > > > > > > This rules provides compatibility with previous xlog handling. > > > > > > Needed for: 2798 > > > --- > > > > > > src/box/iproto_constants.h | 3 ++ > > > > Looks like you forgot to update iproto_constants.c. > > > > > src/box/wal.c | 36 +++++++++++++++- > > > src/box/xrow.c | 38 +++++++++++++++++ > > > src/box/xrow.h | 5 ++- > > > test/unit/xrow.cc | 3 ++ > > > test/vinyl/errinj_stat.result | 8 ++-- > > > test/vinyl/layout.result | 24 +++++------ > > > test/vinyl/stat.result | 78 +++++++++++++++++------------------ > > > 8 files changed, 138 insertions(+), 57 deletions(-) > > > > > > diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h > > > index 728514297..d01cdf840 100644 > > > --- a/src/box/iproto_constants.h > > > +++ b/src/box/iproto_constants.h > > > @@ -60,6 +60,9 @@ enum iproto_key { > > > > > > IPROTO_SCHEMA_VERSION = 0x05, > > > IPROTO_SERVER_VERSION = 0x06, > > > IPROTO_GROUP_ID = 0x07, > > > > > > + IPROTO_TXN_ID = 0x08, > > > + IPROTO_TXN_REPLICA_ID = 0x09, > > > > Do we really need to introduce both TXN_ID and TXN_REPLICA_ID? I would > > expect TXN_ID to be enough - we could use REPLICA_ID to make sure that > > transaction identifiers from different instances don't overlap, because > > there couldn't be a "multi-instance" transaction, could there? > > > > > + IPROTO_TXN_LAST = 0x0a, > > > > I think we should instead introduce BEGIN and COMMIT commands, because: > I completely do not like any auto-commit logic in a xlog file. You suggestion > breaks backward compatibility because previous logs do not have any BEGIN/ It wouldn't break backward compatibility. It might break forward compatibility, which is fine by me (we do it all the time). > COMMIT. Also separate BEGIN and COMMIT messages increase transaction size. I doubt that after compression you'll see a difference. > It is worth noting, that IPROTO_NOP with a new txn_id or txn_last emulates > BEGIN or COMMIT. Yeah, but that would look weird. > > > > > - We might need to attach some extra information to each transaction, > > e.g. mark transactions that were committed in parallel on the master > > so that they can be committed in parallel on a replica. Attaching > > such information to each row would be excessive. > The patch is not about this. But we have to think about that in advance, don't we? > > > > - We will need BEGIN and COMMIT for IPROTO transactions. It would be > > nice if we could share the code with them. > The biggest issue we could not know transaction identifier in case of IPROTO. > Iproto is single stream proto, but wal might be not as it is multiplexing a > lot of transactions in a one output, so it might be bad be in paradigm of > universally format for both IPROTO and WAL. OK. I think we need to discuss the options with Kostja. > > > > - Having separate BEGIN/COMMIT rows would make tarantoolctl-cat output > > easier to read. > > > > > /* Leave a gap for other keys in the header. */ > > > IPROTO_SPACE_ID = 0x10, > > > IPROTO_INDEX_ID = 0x11, > > > > > > diff --git a/src/box/wal.c b/src/box/wal.c > > > index 4c3537672..17ead08e7 100644 > > > --- a/src/box/wal.c > > > +++ b/src/box/wal.c > > > @@ -905,9 +905,10 @@ wal_writer_begin_rollback(struct wal_writer *writer) > > > > > > } > > > > > > static void > > > > > > -wal_assign_lsn(struct vclock *vclock, struct xrow_header **row, > > > +wal_assign_lsn(struct vclock *vclock, struct xrow_header **begin, > > > > > > struct xrow_header **end) > > > > > > { > > > > > > + struct xrow_header **row = begin; > > > > > > /** Assign LSN to all local rows. */ > > > for ( ; row < end; row++) { > > > > > > if ((*row)->replica_id == 0) { > > > > > > @@ -917,6 +918,39 @@ wal_assign_lsn(struct vclock *vclock, struct > > > xrow_header **row,> > > > vclock_follow_xrow(vclock, *row); > > > > > > } > > > > > > } > > > > > > + if ((*begin)->replica_id != instance_id) { > > > + /* > > > + * Move all local changes to the end of rows array and > > > + * a fake local transaction (like an autonomous transaction) > > > + * because we could not replicate the transaction back. > > > + */ > > > + struct xrow_header **row = end - 1; > > > + while (row >= begin) { > > > + if (row[0]->replica_id != instance_id) { > > > + --row; > > > + continue; > > > + } > > > + /* Local row, move it back. */ > > > + struct xrow_header **local_row = row; > > > + while (local_row < end - 1 && > > > + local_row[1]->replica_id != instance_id) { > > > + struct xrow_header *tmp = local_row[0]; > > > + local_row[0] = local_row[1]; > > > + local_row[1] = tmp; > > > > There's SWAP for this. > > > > > + } > > > + --row; > > > + } > > > + while (begin < end && begin[0]->replica_id != instance_id) > > > + ++begin; > > > + } > > > > I don't understand why we need to move rows generated locally (by > > an on_replace trigger I surmise) to the end of a transaction. We > > have TXN_ID attached to each row so we could leave the transactions > > interleaved, couldn't we? > You are right, but in that case applier should track a lot of transaction > simultaneously. Also it complicates recovery too. I hope it will be fixed while > parallel applier implementing. May be, we should implement parallel applier in the scope of this issue then? Anyway, without it, sync replication won't scale with the number of parallel transactions. > > > > > + /* Setup txn_id and tnx_replica_id for localy generated rows. */ > > > + row = begin; > > > + while (row < end) { > > > + row[0]->txn_id = begin[0]->lsn; > > > + row[0]->txn_replica_id = instance_id; > > > + row[0]->txn_last = row == end - 1 ? 1 : 0; > > > + ++row; > > > + } > > > > I think we better use txn->id for TXN_ID rather than LSN. > > Why do you think LSN should be used? I don't see any rationale > > for that anywhere in the comments. Also, setting TXN_ID looks > > like a job that should be done by txn_add_redo... > > > > > } > > > > > > static void > > > > > > diff --git a/src/box/xrow.c b/src/box/xrow.c > > > index ef3f81add..db524b3c8 100644 > > > --- a/src/box/xrow.c > > > +++ b/src/box/xrow.c > > > > > > @@ -133,12 +133,32 @@ error: > > > case IPROTO_SCHEMA_VERSION: > > > header->schema_version = mp_decode_uint(pos); > > > break; > > > > > > + case IPROTO_TXN_ID: > > > + header->txn_id = mp_decode_uint(pos); > > > + break; > > > + case IPROTO_TXN_REPLICA_ID: > > > + header->txn_replica_id = mp_decode_uint(pos); > > > + break; > > > + case IPROTO_TXN_LAST: > > > + header->txn_last = mp_decode_uint(pos); > > > > Should be bool? > > > > > + break; > > > > > > default: > > > /* unknown header */ > > > mp_next(pos); > > > > > > } > > > > > > } > > > assert(*pos <= end); > > > > > > + if (header->txn_id == 0) { > > > + /* > > > + * Transaction id is not set so it is a single statement > > > + * transaction. > > > + */ > > > + header->txn_id = header->lsn; > > > + header->txn_last = true; > > > + } > > > + if (header->txn_replica_id == 0) > > > + header->txn_replica_id = header->replica_id; > > > + > > > > > > /* Nop requests aren't supposed to have a body. */ > > > if (*pos < end && header->type != IPROTO_NOP) { > > > > > > const char *body = *pos; > > > > > > @@ -223,6 +243,24 @@ xrow_header_encode(const struct xrow_header *header, > > > uint64_t sync,> > > > d = mp_encode_double(d, header->tm); > > > map_size++; > > > > > > } > > > > > > + if (header->txn_id != header->lsn || header->txn_last == 0) { > > > + /* Encode txn id for multi row transaction members. */ > > > + d = mp_encode_uint(d, IPROTO_TXN_ID); > > > + d = mp_encode_uint(d, header->txn_id); > > > + map_size++; > > > + } > > > + if (header->txn_replica_id != header->replica_id) { > > > + d = mp_encode_uint(d, IPROTO_TXN_REPLICA_ID); > > > + d = mp_encode_uint(d, header->txn_replica_id); > > > + map_size++; > > > + } > > > + if (header->txn_last && !(header->txn_id == header->lsn && > > > + header->txn_replica_id == header->replica_id)) > { > > > + /* Set last row for multi row transaction. */ > > > + d = mp_encode_uint(d, IPROTO_TXN_LAST); > > > + d = mp_encode_uint(d, header->txn_last); > > > + map_size++; > > > + } > > > > > > assert(d <= data + XROW_HEADER_LEN_MAX); > > > mp_encode_map(data, map_size); > > > out->iov_len = d - (char *) out->iov_base; > > > > > > diff --git a/src/box/xrow.h b/src/box/xrow.h > > > index 6bab0a1fd..4acd84d56 100644 > > > --- a/src/box/xrow.h > > > +++ b/src/box/xrow.h > > > @@ -47,7 +47,7 @@ enum { > > > > > > XROW_HEADER_IOVMAX = 1, > > > XROW_BODY_IOVMAX = 2, > > > XROW_IOVMAX = XROW_HEADER_IOVMAX + XROW_BODY_IOVMAX, > > > > > > - XROW_HEADER_LEN_MAX = 40, > > > + XROW_HEADER_LEN_MAX = 60, > > > > > > XROW_BODY_LEN_MAX = 128, > > > IPROTO_HEADER_LEN = 28, > > > /** 7 = sizeof(iproto_body_bin). */ > > > > > > @@ -69,6 +69,9 @@ struct xrow_header { > > > > > > uint64_t sync; > > > int64_t lsn; /* LSN must be signed for correct comparison */ > > > double tm; > > > > > > + int64_t txn_id; > > > + uint32_t txn_replica_id; > > > + uint32_t txn_last; > > > > > > int bodycnt; > > > uint32_t schema_version; > > > > > > diff --git a/test/unit/xrow.cc b/test/unit/xrow.cc > > > index 165a543cf..0b796d728 100644 > > > --- a/test/unit/xrow.cc > > > +++ b/test/unit/xrow.cc > > > @@ -215,6 +215,9 @@ test_xrow_header_encode_decode() > > > > > > header.lsn = 400; > > > header.tm = 123.456; > > > header.bodycnt = 0; > > > > > > + header.txn_id = header.lsn; > > > + header.txn_replica_id = header.replica_id; > > > + header.txn_last = true; > > > > > > uint64_t sync = 100500; > > > struct iovec vec[1]; > > > is(1, xrow_header_encode(&header, sync, vec, 200), "encode"); > > > > > > diff --git a/test/vinyl/errinj_stat.result b/test/vinyl/errinj_stat.result > > > index 08801dbc6..361ddf5db 100644 > > > --- a/test/vinyl/errinj_stat.result > > > +++ b/test/vinyl/errinj_stat.result > > > @@ -69,7 +69,7 @@ i:stat().disk.compaction.queue -- 30 statements > > > > > > - bytes_compressed: > > > > > > pages: 3 > > > rows: 30 > > > > > > - bytes: 411 > > > + bytes: 477 > > > > > > ... > > > i:stat().disk.compaction.queue.bytes == > > > box.stat.vinyl().scheduler.compaction_queue --- > > > > > > @@ -83,7 +83,7 @@ i:stat().disk.compaction.queue -- 40 statements > > > > > > - bytes_compressed: > > > > > > pages: 4 > > > rows: 40 > > > > > > - bytes: 548 > > > + bytes: 636 > > > > I wouldn't expect vinyl stats to be changed by this patch. > > Why did it happen? > Because if rows were written in an one entry, then wal creates transaction. But those are vinyl files (run, index). They shouldn't be affected by this, should they?