Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: "Георгий Кириченко" <georgy@tarantool.org>
Cc: tarantool-patches@freelists.org
Subject: Re: [tarantool-patches] [PATCH v2 1/2] Journal transaction boundaries
Date: Tue, 29 Jan 2019 14:00:10 +0300	[thread overview]
Message-ID: <20190129110010.qav5cedodjfs4ju4@esperanza> (raw)
In-Reply-To: <3229189.u9recYUF4a@home.lan>

On Tue, Jan 29, 2019 at 01:09:50PM +0300, Георгий Кириченко wrote:
> On Monday, January 28, 2019 3:58:59 PM MSK Vladimir Davydov wrote:
> > On Tue, Jan 22, 2019 at 01:57:36PM +0300, Georgy Kirichenko wrote:
> > > Append txn_id, txn_replica_id and txn_last to xrow_header structure.
> > > txn_replica_id identifies replica where transaction was started and
> > > txn_id identifies transaction id on that replica. As transaction id
> > > a lsn of the first row in this transaction is used.
> > > txn_last set to true if it is the last row in a transaction, so we
> > > could commit transaction with last row or use additional NOP requests
> > > with txn_last = true ans valid txn_id and txn_replica_id.
> > > For replication all local changes moved to xrows array tail to form
> > > a separate transaction (like autonomous transaction) because it is not
> > > possible to replicate such transaction back to it's creator.
> > > 
> > > As encoding/deconding rules assumed:
> > >  1. txn_replica_id is encoded only if it is not equal with replica
> > >  
> > >     id. This might have point because of replication trigger
> > >  
> > >  2. txn_id and txn_last are encoded only for multi-row transaction.
> > >  
> > >     So if we do not have txn_id in a xstream then this means that it
> > >     is a single row transaction.
> > > 
> > > This rules provides compatibility with previous xlog handling.
> > > 
> > > Needed for: 2798
> > > ---
> > > 
> > >  src/box/iproto_constants.h    |  3 ++
> > 
> > Looks like you forgot to update iproto_constants.c.
> > 
> > >  src/box/wal.c                 | 36 +++++++++++++++-
> > >  src/box/xrow.c                | 38 +++++++++++++++++
> > >  src/box/xrow.h                |  5 ++-
> > >  test/unit/xrow.cc             |  3 ++
> > >  test/vinyl/errinj_stat.result |  8 ++--
> > >  test/vinyl/layout.result      | 24 +++++------
> > >  test/vinyl/stat.result        | 78 +++++++++++++++++------------------
> > >  8 files changed, 138 insertions(+), 57 deletions(-)
> > > 
> > > diff --git a/src/box/iproto_constants.h b/src/box/iproto_constants.h
> > > index 728514297..d01cdf840 100644
> > > --- a/src/box/iproto_constants.h
> > > +++ b/src/box/iproto_constants.h
> > > @@ -60,6 +60,9 @@ enum iproto_key {
> > > 
> > >  	IPROTO_SCHEMA_VERSION = 0x05,
> > >  	IPROTO_SERVER_VERSION = 0x06,
> > >  	IPROTO_GROUP_ID = 0x07,
> > > 
> > > +	IPROTO_TXN_ID = 0x08,
> > > +	IPROTO_TXN_REPLICA_ID = 0x09,
> > 
> > Do we really need to introduce both TXN_ID and TXN_REPLICA_ID? I would
> > expect TXN_ID to be enough - we could use REPLICA_ID to make sure that
> > transaction identifiers from different instances don't overlap, because
> > there couldn't be a "multi-instance" transaction, could there?
> > 
> > > +	IPROTO_TXN_LAST = 0x0a,
> > 
> > I think we should instead introduce BEGIN and COMMIT commands, because:
> I completely do not like any auto-commit logic in a xlog file. You suggestion 
> breaks backward compatibility because previous logs do not have any BEGIN/

It wouldn't break backward compatibility. It might break forward
compatibility, which is fine by me (we do it all the time).

> COMMIT. Also separate BEGIN and COMMIT messages increase transaction size.

I doubt that after compression you'll see a difference.

> It is worth noting, that IPROTO_NOP with a new txn_id or txn_last emulates
> BEGIN or COMMIT.

Yeah, but that would look weird.

> 
> > 
> >  - We might need to attach some extra information to each transaction,
> >    e.g. mark transactions that were committed in parallel on the master
> >    so that they can be committed in parallel on a replica. Attaching
> >    such information to each row would be excessive.
> The patch is not about this.

But we have to think about that in advance, don't we?

> > 
> >  - We will need BEGIN and COMMIT for IPROTO transactions. It would be
> >    nice if we could share the code with them.
> The biggest issue we could not know transaction identifier in case of IPROTO.
> Iproto is single stream proto, but wal might be not as it is multiplexing a 
> lot of transactions in a one output, so it might be bad be in paradigm of 
> universally format for both IPROTO and WAL.

OK. I think we need to discuss the options with Kostja.

> > 
> >  - Having separate BEGIN/COMMIT rows would make tarantoolctl-cat output
> >    easier to read.
> > 
> > >  	/* Leave a gap for other keys in the header. */
> > >  	IPROTO_SPACE_ID = 0x10,
> > >  	IPROTO_INDEX_ID = 0x11,
> > > 
> > > diff --git a/src/box/wal.c b/src/box/wal.c
> > > index 4c3537672..17ead08e7 100644
> > > --- a/src/box/wal.c
> > > +++ b/src/box/wal.c
> > > @@ -905,9 +905,10 @@ wal_writer_begin_rollback(struct wal_writer *writer)
> > > 
> > >  }
> > >  
> > >  static void
> > > 
> > > -wal_assign_lsn(struct vclock *vclock, struct xrow_header **row,
> > > +wal_assign_lsn(struct vclock *vclock, struct xrow_header **begin,
> > > 
> > >  	       struct xrow_header **end)
> > >  
> > >  {
> > > 
> > > +	struct xrow_header **row = begin;
> > > 
> > >  	/** Assign LSN to all local rows. */
> > >  	for ( ; row < end; row++) {
> > >  	
> > >  		if ((*row)->replica_id == 0) {
> > > 
> > > @@ -917,6 +918,39 @@ wal_assign_lsn(struct vclock *vclock, struct
> > > xrow_header **row,> 
> > >  			vclock_follow_xrow(vclock, *row);
> > >  		
> > >  		}
> > >  	
> > >  	}
> > > 
> > > +	if ((*begin)->replica_id != instance_id) {
> > > +		/*
> > > +		 * Move all local changes to the end of rows array and
> > > +		 * a fake local transaction (like an autonomous transaction)
> > > +		 * because we could not replicate the transaction back.
> > > +		 */
> > > +		struct xrow_header **row = end - 1;
> > > +		while (row >= begin) {
> > > +			if (row[0]->replica_id != instance_id) {
> > > +				--row;
> > > +				continue;
> > > +			}
> > > +			/* Local row, move it back. */
> > > +			struct xrow_header **local_row = row;
> > > +			while (local_row < end - 1 &&
> > > +			       local_row[1]->replica_id != instance_id) {
> > > +				struct xrow_header *tmp = local_row[0];
> > > +				local_row[0] = local_row[1];
> > > +				local_row[1] = tmp;
> > 
> > There's SWAP for this.
> > 
> > > +			}
> > > +			--row;
> > > +		}
> > > +		while (begin < end && begin[0]->replica_id != instance_id)
> > > +			++begin;
> > > +	}
> > 
> > I don't understand why we need to move rows generated locally (by
> > an on_replace trigger I surmise) to the end of a transaction. We
> > have TXN_ID attached to each row so we could leave the transactions
> > interleaved, couldn't we?
> You are right, but in that case applier should track a lot of transaction 
> simultaneously. Also it complicates recovery too. I hope it will be fixed while 
> parallel applier implementing.

May be, we should implement parallel applier in the scope of this issue
then? Anyway, without it, sync replication won't scale with the number
of parallel transactions.

> > 
> > > +	/* Setup txn_id and tnx_replica_id for localy generated rows. */
> > > +	row = begin;
> > > +	while (row < end) {
> > > +		row[0]->txn_id = begin[0]->lsn;
> > > +		row[0]->txn_replica_id = instance_id;
> > > +		row[0]->txn_last = row == end - 1 ? 1 : 0;
> > > +		++row;
> > > +	}
> > 
> > I think we better use txn->id for TXN_ID rather than LSN.
> > Why do you think LSN should be used? I don't see any rationale
> > for that anywhere in the comments. Also, setting TXN_ID looks
> > like a job that should be done by txn_add_redo...
> > 
> > >  }
> > >  
> > >  static void
> > > 
> > > diff --git a/src/box/xrow.c b/src/box/xrow.c
> > > index ef3f81add..db524b3c8 100644
> > > --- a/src/box/xrow.c
> > > +++ b/src/box/xrow.c
> > > 
> > > @@ -133,12 +133,32 @@ error:
> > >  		case IPROTO_SCHEMA_VERSION:
> > >  			header->schema_version = mp_decode_uint(pos);
> > >  			break;
> > > 
> > > +		case IPROTO_TXN_ID:
> > > +			header->txn_id = mp_decode_uint(pos);
> > > +			break;
> > > +		case IPROTO_TXN_REPLICA_ID:
> > > +			header->txn_replica_id = mp_decode_uint(pos);
> > > +			break;
> > > +		case IPROTO_TXN_LAST:
> > > +			header->txn_last = mp_decode_uint(pos);
> > 
> > Should be bool?
> > 
> > > +			break;
> > > 
> > >  		default:
> > >  			/* unknown header */
> > >  			mp_next(pos);
> > >  		
> > >  		}
> > >  	
> > >  	}
> > >  	assert(*pos <= end);
> > > 
> > > +	if (header->txn_id == 0) {
> > > +		/*
> > > +		 * Transaction id is not set so it is a single statement
> > > +		 * transaction.
> > > +		 */
> > > +		header->txn_id = header->lsn;
> > > +		header->txn_last = true;
> > > +	}
> > > +	if (header->txn_replica_id == 0)
> > > +		header->txn_replica_id = header->replica_id;
> > > +
> > > 
> > >  	/* Nop requests aren't supposed to have a body. */
> > >  	if (*pos < end && header->type != IPROTO_NOP) {
> > >  	
> > >  		const char *body = *pos;
> > > 
> > > @@ -223,6 +243,24 @@ xrow_header_encode(const struct xrow_header *header,
> > > uint64_t sync,> 
> > >  		d = mp_encode_double(d, header->tm);
> > >  		map_size++;
> > >  	
> > >  	}
> > > 
> > > +	if (header->txn_id != header->lsn || header->txn_last == 0) {
> > > +		/* Encode txn id for multi row transaction members. */
> > > +		d = mp_encode_uint(d, IPROTO_TXN_ID);
> > > +		d = mp_encode_uint(d, header->txn_id);
> > > +		map_size++;
> > > +	}
> > > +	if (header->txn_replica_id != header->replica_id) {
> > > +		d = mp_encode_uint(d, IPROTO_TXN_REPLICA_ID);
> > > +		d = mp_encode_uint(d, header->txn_replica_id);
> > > +		map_size++;
> > > +	}
> > > +	if (header->txn_last && !(header->txn_id == header->lsn &&
> > > +				  header->txn_replica_id == header->replica_id)) 
> {
> > > +		/* Set last row for multi row transaction. */
> > > +		d = mp_encode_uint(d, IPROTO_TXN_LAST);
> > > +		d = mp_encode_uint(d, header->txn_last);
> > > +		map_size++;
> > > +	}
> > > 
> > >  	assert(d <= data + XROW_HEADER_LEN_MAX);
> > >  	mp_encode_map(data, map_size);
> > >  	out->iov_len = d - (char *) out->iov_base;
> > > 
> > > diff --git a/src/box/xrow.h b/src/box/xrow.h
> > > index 6bab0a1fd..4acd84d56 100644
> > > --- a/src/box/xrow.h
> > > +++ b/src/box/xrow.h
> > > @@ -47,7 +47,7 @@ enum {
> > > 
> > >  	XROW_HEADER_IOVMAX = 1,
> > >  	XROW_BODY_IOVMAX = 2,
> > >  	XROW_IOVMAX = XROW_HEADER_IOVMAX + XROW_BODY_IOVMAX,
> > > 
> > > -	XROW_HEADER_LEN_MAX = 40,
> > > +	XROW_HEADER_LEN_MAX = 60,
> > > 
> > >  	XROW_BODY_LEN_MAX = 128,
> > >  	IPROTO_HEADER_LEN = 28,
> > >  	/** 7 = sizeof(iproto_body_bin). */
> > > 
> > > @@ -69,6 +69,9 @@ struct xrow_header {
> > > 
> > >  	uint64_t sync;
> > >  	int64_t lsn; /* LSN must be signed for correct comparison */
> > >  	double tm;
> > > 
> > > +	int64_t txn_id;
> > > +	uint32_t txn_replica_id;
> > > +	uint32_t txn_last;
> > > 
> > >  	int bodycnt;
> > >  	uint32_t schema_version;
> > > 
> > > diff --git a/test/unit/xrow.cc b/test/unit/xrow.cc
> > > index 165a543cf..0b796d728 100644
> > > --- a/test/unit/xrow.cc
> > > +++ b/test/unit/xrow.cc
> > > @@ -215,6 +215,9 @@ test_xrow_header_encode_decode()
> > > 
> > >  	header.lsn = 400;
> > >  	header.tm = 123.456;
> > >  	header.bodycnt = 0;
> > > 
> > > +	header.txn_id = header.lsn;
> > > +	header.txn_replica_id = header.replica_id;
> > > +	header.txn_last = true;
> > > 
> > >  	uint64_t sync = 100500;
> > >  	struct iovec vec[1];
> > >  	is(1, xrow_header_encode(&header, sync, vec, 200), "encode");
> > > 
> > > diff --git a/test/vinyl/errinj_stat.result b/test/vinyl/errinj_stat.result
> > > index 08801dbc6..361ddf5db 100644
> > > --- a/test/vinyl/errinj_stat.result
> > > +++ b/test/vinyl/errinj_stat.result
> > > @@ -69,7 +69,7 @@ i:stat().disk.compaction.queue -- 30 statements
> > > 
> > >  - bytes_compressed: <bytes_compressed>
> > >  
> > >    pages: 3
> > >    rows: 30
> > > 
> > > -  bytes: 411
> > > +  bytes: 477
> > > 
> > >  ...
> > >  i:stat().disk.compaction.queue.bytes ==
> > >  box.stat.vinyl().scheduler.compaction_queue ---
> > > 
> > > @@ -83,7 +83,7 @@ i:stat().disk.compaction.queue -- 40 statements
> > > 
> > >  - bytes_compressed: <bytes_compressed>
> > >  
> > >    pages: 4
> > >    rows: 40
> > > 
> > > -  bytes: 548
> > > +  bytes: 636
> > 
> > I wouldn't expect vinyl stats to be changed by this patch.
> > Why did it happen?
> Because if rows were written in an one entry, then wal creates transaction.

But those are vinyl files (run, index). They shouldn't be affected by
this, should they?

  reply	other threads:[~2019-01-29 11:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-22 10:57 [tarantool-patches] [PATCH v2 0/2] Transaction boundaries in replication protocol Georgy Kirichenko
2019-01-22 10:57 ` [tarantool-patches] [PATCH v2 1/2] Journal transaction boundaries Georgy Kirichenko
2019-01-28 12:58   ` Vladimir Davydov
2019-01-29 10:09     ` Георгий Кириченко
2019-01-29 11:00       ` Vladimir Davydov [this message]
2019-01-31  7:34         ` Георгий Кириченко
2019-01-31  8:19           ` Vladimir Davydov
2019-01-31 14:25             ` Георгий Кириченко
2019-01-31 14:54               ` Vladimir Davydov
2019-02-01  9:31                 ` Георгий Кириченко
2019-01-28 13:00   ` Vladimir Davydov
2019-01-28 13:08     ` [tarantool-patches] " Vladislav Shpilevoy
2019-02-08 16:56   ` Konstantin Osipov
2019-01-22 10:57 ` [tarantool-patches] [PATCH v2 2/2] Transaction support for applier Georgy Kirichenko
2019-01-28 13:35   ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190129110010.qav5cedodjfs4ju4@esperanza \
    --to=vdavydov.dev@gmail.com \
    --cc=georgy@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [tarantool-patches] [PATCH v2 1/2] Journal transaction boundaries' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox