Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v2 0/2] fix replication tx boundaries after local space rework
@ 2020-05-25 10:58 Serge Petrenko
  2020-05-25 10:58 ` [Tarantool-patches] [PATCH v2 1/2] wal: fix tx boundaries Serge Petrenko
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Serge Petrenko @ 2020-05-25 10:58 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov, kostja.osipov; +Cc: tarantool-patches

Changes in v2:
  - Pick a different approach for fixing is_commit
    flag delivery. Instead of making relay send txs
    in batches, simply reorder rows before writing
    them to WAL so that a tx always ends on a global
    row (if there is such a row at all).

Serge Petrenko (2):
  wal: fix tx boundaries
  wal: reorder tx rows so that a tx ends on a global row

 src/box/wal.c                                 |  49 ++++++-
 test/replication/gh-4928-tx-boundaries.result | 138 ++++++++++++++++++
 .../gh-4928-tx-boundaries.test.lua            |  61 ++++++++
 test/replication/suite.cfg                    |   1 +
 4 files changed, 243 insertions(+), 6 deletions(-)
 create mode 100644 test/replication/gh-4928-tx-boundaries.result
 create mode 100644 test/replication/gh-4928-tx-boundaries.test.lua

-- 
2.24.2 (Apple Git-127)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Tarantool-patches] [PATCH v2 1/2] wal: fix tx boundaries
  2020-05-25 10:58 [Tarantool-patches] [PATCH v2 0/2] fix replication tx boundaries after local space rework Serge Petrenko
@ 2020-05-25 10:58 ` Serge Petrenko
  2020-05-28 22:53   ` Vladislav Shpilevoy
  2020-05-25 10:58 ` [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row Serge Petrenko
  2020-05-28 22:53 ` [Tarantool-patches] [PATCH v2 0/2] fix replication tx boundaries after local space rework Vladislav Shpilevoy
  2 siblings, 1 reply; 26+ messages in thread
From: Serge Petrenko @ 2020-05-25 10:58 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov, kostja.osipov; +Cc: tarantool-patches

In order to preserve transaction boundaries in replication protocol, wal
assigns each tx row a transaction sequence number (tsn). Tsn is equal to
the lsn of the first transaction row.

Starting with commit 7eb4650eecf1ac382119d0038076c19b2708f4a1, local
space requests are assigned a special replica id, 0, and have their own
lsns. These operations are not replicated.

If a transaction starting with a local space operation ends up in the
WAL, it gets a tsn equal to the lsn of the local space request. Then,
during replication, when such a transaction is replicated, the local
space request is omitted, and replica receives a global part of the
transaction with a seemingly random tsn, yielding an ER_PROTOCOL error:
"Transaction id must be equal to LSN of the first row in the transaction".

Assign tsn as equal to the lsn of the first global row in the
transaction to fix the problem, and assign tsn as before for fully local
transactions.

Follow-up #4114
Part-of #4928

Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/wal.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/src/box/wal.c b/src/box/wal.c
index b979244e3..ef4d84920 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -956,25 +956,37 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 	       struct xrow_header **end)
 {
 	int64_t tsn = 0;
+	struct xrow_header **start = row;
+	struct xrow_header **first_glob_row = row;
 	/** Assign LSN to all local rows. */
 	for ( ; row < end; row++) {
 		if ((*row)->replica_id == 0) {
 			/*
 			 * All rows representing local space data
-			 * manipulations are signed wth a zero
+			 * manipulations are signed with a zero
 			 * instance id. This is also true for
 			 * anonymous replicas, since they are
 			 * only capable of writing to local and
 			 * temporary spaces.
 			 */
-			if ((*row)->group_id != GROUP_LOCAL)
+			if ((*row)->group_id != GROUP_LOCAL) {
 				(*row)->replica_id = instance_id;
+			}
 
 			(*row)->lsn = vclock_inc(vclock_diff, (*row)->replica_id) +
 				      vclock_get(base, (*row)->replica_id);
-			/* Use lsn of the first local row as transaction id. */
-			tsn = tsn == 0 ? (*row)->lsn : tsn;
-			(*row)->tsn = tsn;
+			/*
+			 * Use lsn of the first global row as
+			 * transaction id.
+			 */
+			if ((*row)->group_id != GROUP_LOCAL && tsn == 0) {
+				tsn = (*row)->lsn;
+				/*
+				 * Remember the tail being processed.
+				 */
+				first_glob_row = row;
+			}
+			(*row)->tsn = tsn == 0 ? (*start)->lsn : tsn;
 			(*row)->is_commit = row == end - 1;
 		} else {
 			int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
@@ -993,6 +1005,14 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 			}
 		}
 	}
+
+	/*
+	 * Fill transaction id for all the local rows preceding
+	 * the first global row. tsn was yet unknown when those
+	 * rows were processed.
+	 */
+	for (row = start; row < first_glob_row; row++)
+		(*row)->tsn = tsn;
 }
 
 static void
-- 
2.24.2 (Apple Git-127)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-25 10:58 [Tarantool-patches] [PATCH v2 0/2] fix replication tx boundaries after local space rework Serge Petrenko
  2020-05-25 10:58 ` [Tarantool-patches] [PATCH v2 1/2] wal: fix tx boundaries Serge Petrenko
@ 2020-05-25 10:58 ` Serge Petrenko
  2020-05-25 15:13   ` Cyrill Gorcunov
                     ` (2 more replies)
  2020-05-28 22:53 ` [Tarantool-patches] [PATCH v2 0/2] fix replication tx boundaries after local space rework Vladislav Shpilevoy
  2 siblings, 3 replies; 26+ messages in thread
From: Serge Petrenko @ 2020-05-25 10:58 UTC (permalink / raw)
  To: v.shpilevoy, gorcunov, kostja.osipov; +Cc: tarantool-patches

Since we stopped sending local space operations in replication, the last
tx row has to be global in order to preserve tx boundaries on replica.
If the last row happens to be a local one, replica will never receive
the tx end marker, yielding the following errors:
`ER_UNSUPPORTED: replication does not support interleaving
transactions`.

In order to fix the problem reorder the rows before writing them to WAL
so that the last tx row is a global one.
Do nothing if the transaction is fully local. It isn't replicated so the
tx borders aren't an issue here.

Such reordering doesn't break anything, since only the relative positions
between some local rows and one global row are changed. Local and global
rows do not share a common lsn counter: local rows use a zero instance
id. So each row gets the same lsn it would get without the reordering,
the transaction looks exactly the same when replicated, and recovery
also isn't an issue, since local and global space operations can be
reordered freely.

Follow-up #4114
Closes #4928
---
 src/box/wal.c                                 |  19 ++-
 test/replication/gh-4928-tx-boundaries.result | 138 ++++++++++++++++++
 .../gh-4928-tx-boundaries.test.lua            |  61 ++++++++
 test/replication/suite.cfg                    |   1 +
 4 files changed, 218 insertions(+), 1 deletion(-)
 create mode 100644 test/replication/gh-4928-tx-boundaries.result
 create mode 100644 test/replication/gh-4928-tx-boundaries.test.lua

diff --git a/src/box/wal.c b/src/box/wal.c
index ef4d84920..0921c7261 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -958,6 +958,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 	int64_t tsn = 0;
 	struct xrow_header **start = row;
 	struct xrow_header **first_glob_row = row;
+	struct xrow_header **last_glob_row = end - 1;
 	/** Assign LSN to all local rows. */
 	for ( ; row < end; row++) {
 		if ((*row)->replica_id == 0) {
@@ -971,6 +972,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 			 */
 			if ((*row)->group_id != GROUP_LOCAL) {
 				(*row)->replica_id = instance_id;
+				last_glob_row = row;
 			}
 
 			(*row)->lsn = vclock_inc(vclock_diff, (*row)->replica_id) +
@@ -987,7 +989,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 				first_glob_row = row;
 			}
 			(*row)->tsn = tsn == 0 ? (*start)->lsn : tsn;
-			(*row)->is_commit = row == end - 1;
+			(*row)->is_commit = false;
 		} else {
 			int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
 			if (diff <= vclock_get(vclock_diff,
@@ -1013,6 +1015,21 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 	 */
 	for (row = start; row < first_glob_row; row++)
 		(*row)->tsn = tsn;
+
+	/*
+	 * If a mixed transaction ends on a local row, float up
+	 * the last global row, so that the upper tx boundary is
+	 * delivered to the replica.
+	 */
+	if (last_glob_row < end - 1) {
+		struct xrow_header *tmp = *last_glob_row;
+		memmove(last_glob_row, last_glob_row + 1,
+			(end - 1 - last_glob_row) *
+			sizeof(struct xrow_header *));
+		*(end - 1) = tmp;
+	}
+
+	(*(end - 1))->is_commit = true;
 }
 
 static void
diff --git a/test/replication/gh-4928-tx-boundaries.result b/test/replication/gh-4928-tx-boundaries.result
new file mode 100644
index 000000000..969bd8438
--- /dev/null
+++ b/test/replication/gh-4928-tx-boundaries.result
@@ -0,0 +1,138 @@
+-- test-run result file version 2
+-- gh-4928. Test that transactions mixing local and global
+-- space operations are replicated correctly.
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+bit = require('bit')
+ | ---
+ | ...
+
+-- Init.
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+_ = box.schema.space.create('glob')
+ | ---
+ | ...
+_ = box.schema.space.create('loc', {is_local=true})
+ | ---
+ | ...
+_ = box.space.glob:create_index('pk')
+ | ---
+ | ...
+_ = box.space.loc:create_index('pk')
+ | ---
+ | ...
+
+function gen_mixed_tx(i)\
+    box.begin()\
+    if bit.band(i, 1) ~= 0 then\
+        box.space.glob:insert{10 * i + 1}\
+    else\
+        box.space.loc:insert{10 * i + 1}\
+    end\
+    if bit.band(i, 2) ~= 0 then\
+        box.space.glob:insert{10 * i + 2}\
+    else\
+        box.space.loc:insert{10 * i + 2}\
+    end\
+    if bit.band(i, 4) ~= 0 then\
+        box.space.glob:insert{10 * i + 3}\
+    else\
+        box.space.loc:insert{10 * i + 3}\
+    end\
+    box.commit()\
+end
+ | ---
+ | ...
+
+test_run:cmd("create server replica with rpl_master=default,\
+             script='replication/replica.lua'")
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_downstream(2, {status='follow'})
+ | ---
+ | - true
+ | ...
+
+for i = 0, 7 do gen_mixed_tx(i) end
+ | ---
+ | ...
+
+box.info.replication[2].status
+ | ---
+ | - null
+ | ...
+
+vclock = box.info.vclock
+ | ---
+ | ...
+vclock[0] = nil
+ | ---
+ | ...
+test_run:wait_vclock("replica", vclock)
+ | ---
+ | ...
+
+test_run:cmd('switch replica')
+ | ---
+ | - true
+ | ...
+
+box.info.status
+ | ---
+ | - running
+ | ...
+box.info.replication[1].upstream.status
+ | ---
+ | - follow
+ | ...
+
+box.space.glob:select{}
+ | ---
+ | - - [11]
+ |   - [22]
+ |   - [31]
+ |   - [32]
+ |   - [43]
+ |   - [51]
+ |   - [53]
+ |   - [62]
+ |   - [63]
+ |   - [71]
+ |   - [72]
+ |   - [73]
+ | ...
+
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+
+-- Cleanup.
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
+box.space.loc:drop()
+ | ---
+ | ...
+box.space.glob:drop()
+ | ---
+ | ...
diff --git a/test/replication/gh-4928-tx-boundaries.test.lua b/test/replication/gh-4928-tx-boundaries.test.lua
new file mode 100644
index 000000000..92526fc51
--- /dev/null
+++ b/test/replication/gh-4928-tx-boundaries.test.lua
@@ -0,0 +1,61 @@
+-- gh-4928. Test that transactions mixing local and global
+-- space operations are replicated correctly.
+env = require('test_run')
+test_run = env.new()
+bit = require('bit')
+
+-- Init.
+box.schema.user.grant('guest', 'replication')
+_ = box.schema.space.create('glob')
+_ = box.schema.space.create('loc', {is_local=true})
+_ = box.space.glob:create_index('pk')
+_ = box.space.loc:create_index('pk')
+
+function gen_mixed_tx(i)\
+    box.begin()\
+    if bit.band(i, 1) ~= 0 then\
+        box.space.glob:insert{10 * i + 1}\
+    else\
+        box.space.loc:insert{10 * i + 1}\
+    end\
+    if bit.band(i, 2) ~= 0 then\
+        box.space.glob:insert{10 * i + 2}\
+    else\
+        box.space.loc:insert{10 * i + 2}\
+    end\
+    if bit.band(i, 4) ~= 0 then\
+        box.space.glob:insert{10 * i + 3}\
+    else\
+        box.space.loc:insert{10 * i + 3}\
+    end\
+    box.commit()\
+end
+
+test_run:cmd("create server replica with rpl_master=default,\
+             script='replication/replica.lua'")
+test_run:cmd('start server replica')
+test_run:wait_downstream(2, {status='follow'})
+
+for i = 0, 7 do gen_mixed_tx(i) end
+
+box.info.replication[2].status
+
+vclock = box.info.vclock
+vclock[0] = nil
+test_run:wait_vclock("replica", vclock)
+
+test_run:cmd('switch replica')
+
+box.info.status
+box.info.replication[1].upstream.status
+
+box.space.glob:select{}
+
+test_run:cmd('switch default')
+
+-- Cleanup.
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+box.schema.user.revoke('guest', 'replication')
+box.space.loc:drop()
+box.space.glob:drop()
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index d2743b5ed..f357b07da 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -18,6 +18,7 @@
     "gh-4606-admin-creds.test.lua": {},
     "gh-4739-vclock-assert.test.lua": {},
     "gh-4730-applier-rollback.test.lua": {},
+    "gh-4928-tx-boundaries.test.lua": {},
     "*": {
         "memtx": {"engine": "memtx"},
         "vinyl": {"engine": "vinyl"}
-- 
2.24.2 (Apple Git-127)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-25 10:58 ` [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row Serge Petrenko
@ 2020-05-25 15:13   ` Cyrill Gorcunov
  2020-05-25 16:34   ` Konstantin Osipov
  2020-05-28 22:54   ` Vladislav Shpilevoy
  2 siblings, 0 replies; 26+ messages in thread
From: Cyrill Gorcunov @ 2020-05-25 15:13 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy

On Mon, May 25, 2020 at 01:58:56PM +0300, Serge Petrenko wrote:
...
> diff --git a/src/box/wal.c b/src/box/wal.c
> index ef4d84920..0921c7261 100644
> --- a/src/box/wal.c
> +++ b/src/box/wal.c
> @@ -958,6 +958,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>  	int64_t tsn = 0;
>  	struct xrow_header **start = row;
>  	struct xrow_header **first_glob_row = row;
> +	struct xrow_header **last_glob_row = end - 1;
>  	/** Assign LSN to all local rows. */
>  	for ( ; row < end; row++) {
>  		if ((*row)->replica_id == 0) {
> @@ -971,6 +972,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>  			 */
>  			if ((*row)->group_id != GROUP_LOCAL) {
>  				(*row)->replica_id = instance_id;
> +				last_glob_row = row;
>  			}
>  
>  			(*row)->lsn = vclock_inc(vclock_diff, (*row)->replica_id) +
> @@ -987,7 +989,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>  				first_glob_row = row;
>  			}
>  			(*row)->tsn = tsn == 0 ? (*start)->lsn : tsn;
> -			(*row)->is_commit = row == end - 1;
> +			(*row)->is_commit = false;
>  		} else {
>  			int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
>  			if (diff <= vclock_get(vclock_diff,
> @@ -1013,6 +1015,21 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>  	 */
>  	for (row = start; row < first_glob_row; row++)
>  		(*row)->tsn = tsn;
> +
> +	/*
> +	 * If a mixed transaction ends on a local row, float up
> +	 * the last global row, so that the upper tx boundary is
> +	 * delivered to the replica.
> +	 */
> +	if (last_glob_row < end - 1) {
> +		struct xrow_header *tmp = *last_glob_row;
> +		memmove(last_glob_row, last_glob_row + 1,
> +			(end - 1 - last_glob_row) *
> +			sizeof(struct xrow_header *));
> +		*(end - 1) = tmp;
> +	}

Sergey, I see no errors here still this memove call somehow
worries me, can't explain why, some gut feeling. Maybe because
we're mangling arguments.

That said while it fixes the issue (and we really need this
fix asap)

Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>

but maybe we should come back to this code a bit later
and rethink it.

> +
> +	(*(end - 1))->is_commit = true;
>  }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-25 10:58 ` [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row Serge Petrenko
  2020-05-25 15:13   ` Cyrill Gorcunov
@ 2020-05-25 16:34   ` Konstantin Osipov
  2020-05-25 18:35     ` Cyrill Gorcunov
  2020-05-26  9:41     ` Serge Petrenko
  2020-05-28 22:54   ` Vladislav Shpilevoy
  2 siblings, 2 replies; 26+ messages in thread
From: Konstantin Osipov @ 2020-05-25 16:34 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy

* Serge Petrenko <sergepetrenko@tarantool.org> [20/05/25 14:00]:

While I think it's an acceptable strategy for a bugfix, I still
think relaying has to be fixed to be transactional, the current
xstream api is a huge legacy we're stuck with since Tarantool 1.3!

It saddens me a great deal the relay patch may be shuffled into
the desk once the problem is not as urgent.

> Since we stopped sending local space operations in replication, the last
> tx row has to be global in order to preserve tx boundaries on replica.
> If the last row happens to be a local one, replica will never receive
> the tx end marker, yielding the following errors:
> `ER_UNSUPPORTED: replication does not support interleaving
> transactions`.
> 
> In order to fix the problem reorder the rows before writing them to WAL
> so that the last tx row is a global one.
> Do nothing if the transaction is fully local. It isn't replicated so the
> tx borders aren't an issue here.
> 
> Such reordering doesn't break anything, since only the relative positions
> between some local rows and one global row are changed. Local and global
> rows do not share a common lsn counter: local rows use a zero instance
> id. So each row gets the same lsn it would get without the reordering,
> the transaction looks exactly the same when replicated, and recovery
> also isn't an issue, since local and global space operations can be
> reordered freely.
> 
> Follow-up #4114
> Closes #4928
> ---
>  src/box/wal.c                                 |  19 ++-
>  test/replication/gh-4928-tx-boundaries.result | 138 ++++++++++++++++++
>  .../gh-4928-tx-boundaries.test.lua            |  61 ++++++++
>  test/replication/suite.cfg                    |   1 +
>  4 files changed, 218 insertions(+), 1 deletion(-)
>  create mode 100644 test/replication/gh-4928-tx-boundaries.result
>  create mode 100644 test/replication/gh-4928-tx-boundaries.test.lua
> 
> diff --git a/src/box/wal.c b/src/box/wal.c
> index ef4d84920..0921c7261 100644
> --- a/src/box/wal.c
> +++ b/src/box/wal.c
> @@ -958,6 +958,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>  	int64_t tsn = 0;
>  	struct xrow_header **start = row;
>  	struct xrow_header **first_glob_row = row;
> +	struct xrow_header **last_glob_row = end - 1;
>  	/** Assign LSN to all local rows. */
>  	for ( ; row < end; row++) {
>  		if ((*row)->replica_id == 0) {
> @@ -971,6 +972,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>  			 */
>  			if ((*row)->group_id != GROUP_LOCAL) {
>  				(*row)->replica_id = instance_id;
> +				last_glob_row = row;
>  			}
>  
>  			(*row)->lsn = vclock_inc(vclock_diff, (*row)->replica_id) +
> @@ -987,7 +989,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>  				first_glob_row = row;
>  			}
>  			(*row)->tsn = tsn == 0 ? (*start)->lsn : tsn;
> -			(*row)->is_commit = row == end - 1;
> +			(*row)->is_commit = false;
>  		} else {
>  			int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
>  			if (diff <= vclock_get(vclock_diff,
> @@ -1013,6 +1015,21 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>  	 */
>  	for (row = start; row < first_glob_row; row++)
>  		(*row)->tsn = tsn;
> +
> +	/*
> +	 * If a mixed transaction ends on a local row, float up
> +	 * the last global row, so that the upper tx boundary is
> +	 * delivered to the replica.
> +	 */
> +	if (last_glob_row < end - 1) {
> +		struct xrow_header *tmp = *last_glob_row;
> +		memmove(last_glob_row, last_glob_row + 1,
> +			(end - 1 - last_glob_row) *
> +			sizeof(struct xrow_header *));
> +		*(end - 1) = tmp;
> +	}
> +
> +	(*(end - 1))->is_commit = true;
>  }
>  
>  static void
> diff --git a/test/replication/gh-4928-tx-boundaries.result b/test/replication/gh-4928-tx-boundaries.result
> new file mode 100644
> index 000000000..969bd8438
> --- /dev/null
> +++ b/test/replication/gh-4928-tx-boundaries.result
> @@ -0,0 +1,138 @@
> +-- test-run result file version 2
> +-- gh-4928. Test that transactions mixing local and global
> +-- space operations are replicated correctly.
> +env = require('test_run')
> + | ---
> + | ...
> +test_run = env.new()
> + | ---
> + | ...
> +bit = require('bit')
> + | ---
> + | ...
> +
> +-- Init.
> +box.schema.user.grant('guest', 'replication')
> + | ---
> + | ...
> +_ = box.schema.space.create('glob')
> + | ---
> + | ...
> +_ = box.schema.space.create('loc', {is_local=true})
> + | ---
> + | ...
> +_ = box.space.glob:create_index('pk')
> + | ---
> + | ...
> +_ = box.space.loc:create_index('pk')
> + | ---
> + | ...
> +
> +function gen_mixed_tx(i)\
> +    box.begin()\
> +    if bit.band(i, 1) ~= 0 then\
> +        box.space.glob:insert{10 * i + 1}\
> +    else\
> +        box.space.loc:insert{10 * i + 1}\
> +    end\
> +    if bit.band(i, 2) ~= 0 then\
> +        box.space.glob:insert{10 * i + 2}\
> +    else\
> +        box.space.loc:insert{10 * i + 2}\
> +    end\
> +    if bit.band(i, 4) ~= 0 then\
> +        box.space.glob:insert{10 * i + 3}\
> +    else\
> +        box.space.loc:insert{10 * i + 3}\
> +    end\
> +    box.commit()\
> +end
> + | ---
> + | ...
> +
> +test_run:cmd("create server replica with rpl_master=default,\
> +             script='replication/replica.lua'")
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica')
> + | ---
> + | - true
> + | ...
> +test_run:wait_downstream(2, {status='follow'})
> + | ---
> + | - true
> + | ...
> +
> +for i = 0, 7 do gen_mixed_tx(i) end
> + | ---
> + | ...
> +
> +box.info.replication[2].status
> + | ---
> + | - null
> + | ...
> +
> +vclock = box.info.vclock
> + | ---
> + | ...
> +vclock[0] = nil
> + | ---
> + | ...
> +test_run:wait_vclock("replica", vclock)
> + | ---
> + | ...
> +
> +test_run:cmd('switch replica')
> + | ---
> + | - true
> + | ...
> +
> +box.info.status
> + | ---
> + | - running
> + | ...
> +box.info.replication[1].upstream.status
> + | ---
> + | - follow
> + | ...
> +
> +box.space.glob:select{}
> + | ---
> + | - - [11]
> + |   - [22]
> + |   - [31]
> + |   - [32]
> + |   - [43]
> + |   - [51]
> + |   - [53]
> + |   - [62]
> + |   - [63]
> + |   - [71]
> + |   - [72]
> + |   - [73]
> + | ...
> +
> +test_run:cmd('switch default')
> + | ---
> + | - true
> + | ...
> +
> +-- Cleanup.
> +test_run:cmd('stop server replica')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica')
> + | ---
> + | - true
> + | ...
> +box.schema.user.revoke('guest', 'replication')
> + | ---
> + | ...
> +box.space.loc:drop()
> + | ---
> + | ...
> +box.space.glob:drop()
> + | ---
> + | ...
> diff --git a/test/replication/gh-4928-tx-boundaries.test.lua b/test/replication/gh-4928-tx-boundaries.test.lua
> new file mode 100644
> index 000000000..92526fc51
> --- /dev/null
> +++ b/test/replication/gh-4928-tx-boundaries.test.lua
> @@ -0,0 +1,61 @@
> +-- gh-4928. Test that transactions mixing local and global
> +-- space operations are replicated correctly.
> +env = require('test_run')
> +test_run = env.new()
> +bit = require('bit')
> +
> +-- Init.
> +box.schema.user.grant('guest', 'replication')
> +_ = box.schema.space.create('glob')
> +_ = box.schema.space.create('loc', {is_local=true})
> +_ = box.space.glob:create_index('pk')
> +_ = box.space.loc:create_index('pk')
> +
> +function gen_mixed_tx(i)\
> +    box.begin()\
> +    if bit.band(i, 1) ~= 0 then\
> +        box.space.glob:insert{10 * i + 1}\
> +    else\
> +        box.space.loc:insert{10 * i + 1}\
> +    end\
> +    if bit.band(i, 2) ~= 0 then\
> +        box.space.glob:insert{10 * i + 2}\
> +    else\
> +        box.space.loc:insert{10 * i + 2}\
> +    end\
> +    if bit.band(i, 4) ~= 0 then\
> +        box.space.glob:insert{10 * i + 3}\
> +    else\
> +        box.space.loc:insert{10 * i + 3}\
> +    end\
> +    box.commit()\
> +end
> +
> +test_run:cmd("create server replica with rpl_master=default,\
> +             script='replication/replica.lua'")
> +test_run:cmd('start server replica')
> +test_run:wait_downstream(2, {status='follow'})
> +
> +for i = 0, 7 do gen_mixed_tx(i) end
> +
> +box.info.replication[2].status
> +
> +vclock = box.info.vclock
> +vclock[0] = nil
> +test_run:wait_vclock("replica", vclock)
> +
> +test_run:cmd('switch replica')
> +
> +box.info.status
> +box.info.replication[1].upstream.status
> +
> +box.space.glob:select{}
> +
> +test_run:cmd('switch default')
> +
> +-- Cleanup.
> +test_run:cmd('stop server replica')
> +test_run:cmd('delete server replica')
> +box.schema.user.revoke('guest', 'replication')
> +box.space.loc:drop()
> +box.space.glob:drop()
> diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
> index d2743b5ed..f357b07da 100644
> --- a/test/replication/suite.cfg
> +++ b/test/replication/suite.cfg
> @@ -18,6 +18,7 @@
>      "gh-4606-admin-creds.test.lua": {},
>      "gh-4739-vclock-assert.test.lua": {},
>      "gh-4730-applier-rollback.test.lua": {},
> +    "gh-4928-tx-boundaries.test.lua": {},
>      "*": {
>          "memtx": {"engine": "memtx"},
>          "vinyl": {"engine": "vinyl"}
> -- 
> 2.24.2 (Apple Git-127)
> 

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-25 16:34   ` Konstantin Osipov
@ 2020-05-25 18:35     ` Cyrill Gorcunov
  2020-05-25 20:42       ` Konstantin Osipov
  2020-05-26  9:41     ` Serge Petrenko
  1 sibling, 1 reply; 26+ messages in thread
From: Cyrill Gorcunov @ 2020-05-25 18:35 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, v.shpilevoy

On Mon, May 25, 2020 at 07:34:09PM +0300, Konstantin Osipov wrote:
> * Serge Petrenko <sergepetrenko@tarantool.org> [20/05/25 14:00]:
> 
> While I think it's an acceptable strategy for a bugfix, I still
> think relaying has to be fixed to be transactional, the current
> xstream api is a huge legacy we're stuck with since Tarantool 1.3!
> 
> It saddens me a great deal the relay patch may be shuffled into
> the desk once the problem is not as urgent.

Kostya, could you please describe xstream api here, so I would
file a bug and will assign it to myself/someone and we won't
forget. (yes, I remember you've said me about rework in f2f
conversation but i might be missing some details so please
write it here, in email)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-25 18:35     ` Cyrill Gorcunov
@ 2020-05-25 20:42       ` Konstantin Osipov
  0 siblings, 0 replies; 26+ messages in thread
From: Konstantin Osipov @ 2020-05-25 20:42 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tarantool-patches, v.shpilevoy

* Cyrill Gorcunov <gorcunov@gmail.com> [20/05/25 21:38]:
> On Mon, May 25, 2020 at 07:34:09PM +0300, Konstantin Osipov wrote:
> > * Serge Petrenko <sergepetrenko@tarantool.org> [20/05/25 14:00]:
> > 
> > While I think it's an acceptable strategy for a bugfix, I still
> > think relaying has to be fixed to be transactional, the current
> > xstream api is a huge legacy we're stuck with since Tarantool 1.3!
> > 
> > It saddens me a great deal the relay patch may be shuffled into
> > the desk once the problem is not as urgent.
> 
> Kostya, could you please describe xstream api here, so I would
> file a bug and will assign it to myself/someone and we won't
> forget. (yes, I remember you've said me about rework in f2f
> conversation but i might be missing some details so please
> write it here, in email)

The simplest change would be to switch from returning xrow * 
to returning xrow **, so that xstream can return an entire
transaction. Alternatively, applier is already using a stailq
of objects called applier_tx_row, these are nothing 
but stailq_entry object, this data structure could be reused in xsteam
api, so that it returns a stailq, similar to applier_read_tx.

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-25 16:34   ` Konstantin Osipov
  2020-05-25 18:35     ` Cyrill Gorcunov
@ 2020-05-26  9:41     ` Serge Petrenko
  2020-05-26 11:41       ` Konstantin Osipov
  1 sibling, 1 reply; 26+ messages in thread
From: Serge Petrenko @ 2020-05-26  9:41 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, v.shpilevoy


25.05.2020 19:34, Konstantin Osipov пишет:
> * Serge Petrenko <sergepetrenko@tarantool.org> [20/05/25 14:00]:
>
> While I think it's an acceptable strategy for a bugfix, I still
> think relaying has to be fixed to be transactional, the current
> xstream api is a huge legacy we're stuck with since Tarantool 1.3!
>
> It saddens me a great deal the relay patch may be shuffled into
> the desk once the problem is not as urgent.
Hi! Thanks for your  answer.
I understand your concern with transactional relay rework. However, this
approach faces some non-trivial problems we discussed verbally. That's why
I chose a more simple solution.


I have a question regarding this patch. Do you think I should reorder 
the tx rows
so that the last row is a global one? Maybe it'd be better to just set 
is_commit
flag on the last global row? This breaks tx boundaries in local WAL, but 
preserves
them as is in replication and in other instances' WALs. I don't think 
we'll need tx
boundariesin local WAL anyway, but still breaking the invariant that the 
last tx
row has an is_commit flagset to true is scary.
>
>> Since we stopped sending local space operations in replication, the last
>> tx row has to be global in order to preserve tx boundaries on replica.
>> If the last row happens to be a local one, replica will never receive
>> the tx end marker, yielding the following errors:
>> `ER_UNSUPPORTED: replication does not support interleaving
>> transactions`.
>>
>> In order to fix the problem reorder the rows before writing them to WAL
>> so that the last tx row is a global one.
>> Do nothing if the transaction is fully local. It isn't replicated so the
>> tx borders aren't an issue here.
>>
>> Such reordering doesn't break anything, since only the relative positions
>> between some local rows and one global row are changed. Local and global
>> rows do not share a common lsn counter: local rows use a zero instance
>> id. So each row gets the same lsn it would get without the reordering,
>> the transaction looks exactly the same when replicated, and recovery
>> also isn't an issue, since local and global space operations can be
>> reordered freely.
>>
>> Follow-up #4114
>> Closes #4928
>> ---
>>   src/box/wal.c                                 |  19 ++-
>>   test/replication/gh-4928-tx-boundaries.result | 138 ++++++++++++++++++
>>   .../gh-4928-tx-boundaries.test.lua            |  61 ++++++++
>>   test/replication/suite.cfg                    |   1 +
>>   4 files changed, 218 insertions(+), 1 deletion(-)
>>   create mode 100644 test/replication/gh-4928-tx-boundaries.result
>>   create mode 100644 test/replication/gh-4928-tx-boundaries.test.lua
>>
>> diff --git a/src/box/wal.c b/src/box/wal.c
>> index ef4d84920..0921c7261 100644
>> --- a/src/box/wal.c
>> +++ b/src/box/wal.c
>> @@ -958,6 +958,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>>   	int64_t tsn = 0;
>>   	struct xrow_header **start = row;
>>   	struct xrow_header **first_glob_row = row;
>> +	struct xrow_header **last_glob_row = end - 1;
>>   	/** Assign LSN to all local rows. */
>>   	for ( ; row < end; row++) {
>>   		if ((*row)->replica_id == 0) {
>> @@ -971,6 +972,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>>   			 */
>>   			if ((*row)->group_id != GROUP_LOCAL) {
>>   				(*row)->replica_id = instance_id;
>> +				last_glob_row = row;
>>   			}
>>   
>>   			(*row)->lsn = vclock_inc(vclock_diff, (*row)->replica_id) +
>> @@ -987,7 +989,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>>   				first_glob_row = row;
>>   			}
>>   			(*row)->tsn = tsn == 0 ? (*start)->lsn : tsn;
>> -			(*row)->is_commit = row == end - 1;
>> +			(*row)->is_commit = false;
>>   		} else {
>>   			int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
>>   			if (diff <= vclock_get(vclock_diff,
>> @@ -1013,6 +1015,21 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>>   	 */
>>   	for (row = start; row < first_glob_row; row++)
>>   		(*row)->tsn = tsn;
>> +
>> +	/*
>> +	 * If a mixed transaction ends on a local row, float up
>> +	 * the last global row, so that the upper tx boundary is
>> +	 * delivered to the replica.
>> +	 */
>> +	if (last_glob_row < end - 1) {
>> +		struct xrow_header *tmp = *last_glob_row;
>> +		memmove(last_glob_row, last_glob_row + 1,
>> +			(end - 1 - last_glob_row) *
>> +			sizeof(struct xrow_header *));
>> +		*(end - 1) = tmp;
>> +	}
>> +
>> +	(*(end - 1))->is_commit = true;
>>   }
>>   
>>   static void
>> diff --git a/test/replication/gh-4928-tx-boundaries.result b/test/replication/gh-4928-tx-boundaries.result
>> new file mode 100644
>> index 000000000..969bd8438
>> --- /dev/null
>> +++ b/test/replication/gh-4928-tx-boundaries.result
>> @@ -0,0 +1,138 @@
>> +-- test-run result file version 2
>> +-- gh-4928. Test that transactions mixing local and global
>> +-- space operations are replicated correctly.
>> +env = require('test_run')
>> + | ---
>> + | ...
>> +test_run = env.new()
>> + | ---
>> + | ...
>> +bit = require('bit')
>> + | ---
>> + | ...
>> +
>> +-- Init.
>> +box.schema.user.grant('guest', 'replication')
>> + | ---
>> + | ...
>> +_ = box.schema.space.create('glob')
>> + | ---
>> + | ...
>> +_ = box.schema.space.create('loc', {is_local=true})
>> + | ---
>> + | ...
>> +_ = box.space.glob:create_index('pk')
>> + | ---
>> + | ...
>> +_ = box.space.loc:create_index('pk')
>> + | ---
>> + | ...
>> +
>> +function gen_mixed_tx(i)\
>> +    box.begin()\
>> +    if bit.band(i, 1) ~= 0 then\
>> +        box.space.glob:insert{10 * i + 1}\
>> +    else\
>> +        box.space.loc:insert{10 * i + 1}\
>> +    end\
>> +    if bit.band(i, 2) ~= 0 then\
>> +        box.space.glob:insert{10 * i + 2}\
>> +    else\
>> +        box.space.loc:insert{10 * i + 2}\
>> +    end\
>> +    if bit.band(i, 4) ~= 0 then\
>> +        box.space.glob:insert{10 * i + 3}\
>> +    else\
>> +        box.space.loc:insert{10 * i + 3}\
>> +    end\
>> +    box.commit()\
>> +end
>> + | ---
>> + | ...
>> +
>> +test_run:cmd("create server replica with rpl_master=default,\
>> +             script='replication/replica.lua'")
>> + | ---
>> + | - true
>> + | ...
>> +test_run:cmd('start server replica')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:wait_downstream(2, {status='follow'})
>> + | ---
>> + | - true
>> + | ...
>> +
>> +for i = 0, 7 do gen_mixed_tx(i) end
>> + | ---
>> + | ...
>> +
>> +box.info.replication[2].status
>> + | ---
>> + | - null
>> + | ...
>> +
>> +vclock = box.info.vclock
>> + | ---
>> + | ...
>> +vclock[0] = nil
>> + | ---
>> + | ...
>> +test_run:wait_vclock("replica", vclock)
>> + | ---
>> + | ...
>> +
>> +test_run:cmd('switch replica')
>> + | ---
>> + | - true
>> + | ...
>> +
>> +box.info.status
>> + | ---
>> + | - running
>> + | ...
>> +box.info.replication[1].upstream.status
>> + | ---
>> + | - follow
>> + | ...
>> +
>> +box.space.glob:select{}
>> + | ---
>> + | - - [11]
>> + |   - [22]
>> + |   - [31]
>> + |   - [32]
>> + |   - [43]
>> + |   - [51]
>> + |   - [53]
>> + |   - [62]
>> + |   - [63]
>> + |   - [71]
>> + |   - [72]
>> + |   - [73]
>> + | ...
>> +
>> +test_run:cmd('switch default')
>> + | ---
>> + | - true
>> + | ...
>> +
>> +-- Cleanup.
>> +test_run:cmd('stop server replica')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:cmd('delete server replica')
>> + | ---
>> + | - true
>> + | ...
>> +box.schema.user.revoke('guest', 'replication')
>> + | ---
>> + | ...
>> +box.space.loc:drop()
>> + | ---
>> + | ...
>> +box.space.glob:drop()
>> + | ---
>> + | ...
>> diff --git a/test/replication/gh-4928-tx-boundaries.test.lua b/test/replication/gh-4928-tx-boundaries.test.lua
>> new file mode 100644
>> index 000000000..92526fc51
>> --- /dev/null
>> +++ b/test/replication/gh-4928-tx-boundaries.test.lua
>> @@ -0,0 +1,61 @@
>> +-- gh-4928. Test that transactions mixing local and global
>> +-- space operations are replicated correctly.
>> +env = require('test_run')
>> +test_run = env.new()
>> +bit = require('bit')
>> +
>> +-- Init.
>> +box.schema.user.grant('guest', 'replication')
>> +_ = box.schema.space.create('glob')
>> +_ = box.schema.space.create('loc', {is_local=true})
>> +_ = box.space.glob:create_index('pk')
>> +_ = box.space.loc:create_index('pk')
>> +
>> +function gen_mixed_tx(i)\
>> +    box.begin()\
>> +    if bit.band(i, 1) ~= 0 then\
>> +        box.space.glob:insert{10 * i + 1}\
>> +    else\
>> +        box.space.loc:insert{10 * i + 1}\
>> +    end\
>> +    if bit.band(i, 2) ~= 0 then\
>> +        box.space.glob:insert{10 * i + 2}\
>> +    else\
>> +        box.space.loc:insert{10 * i + 2}\
>> +    end\
>> +    if bit.band(i, 4) ~= 0 then\
>> +        box.space.glob:insert{10 * i + 3}\
>> +    else\
>> +        box.space.loc:insert{10 * i + 3}\
>> +    end\
>> +    box.commit()\
>> +end
>> +
>> +test_run:cmd("create server replica with rpl_master=default,\
>> +             script='replication/replica.lua'")
>> +test_run:cmd('start server replica')
>> +test_run:wait_downstream(2, {status='follow'})
>> +
>> +for i = 0, 7 do gen_mixed_tx(i) end
>> +
>> +box.info.replication[2].status
>> +
>> +vclock = box.info.vclock
>> +vclock[0] = nil
>> +test_run:wait_vclock("replica", vclock)
>> +
>> +test_run:cmd('switch replica')
>> +
>> +box.info.status
>> +box.info.replication[1].upstream.status
>> +
>> +box.space.glob:select{}
>> +
>> +test_run:cmd('switch default')
>> +
>> +-- Cleanup.
>> +test_run:cmd('stop server replica')
>> +test_run:cmd('delete server replica')
>> +box.schema.user.revoke('guest', 'replication')
>> +box.space.loc:drop()
>> +box.space.glob:drop()
>> diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
>> index d2743b5ed..f357b07da 100644
>> --- a/test/replication/suite.cfg
>> +++ b/test/replication/suite.cfg
>> @@ -18,6 +18,7 @@
>>       "gh-4606-admin-creds.test.lua": {},
>>       "gh-4739-vclock-assert.test.lua": {},
>>       "gh-4730-applier-rollback.test.lua": {},
>> +    "gh-4928-tx-boundaries.test.lua": {},
>>       "*": {
>>           "memtx": {"engine": "memtx"},
>>           "vinyl": {"engine": "vinyl"}
>> -- 
>> 2.24.2 (Apple Git-127)
>>
-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-26  9:41     ` Serge Petrenko
@ 2020-05-26 11:41       ` Konstantin Osipov
  2020-05-26 12:08         ` Serge Petrenko
  0 siblings, 1 reply; 26+ messages in thread
From: Konstantin Osipov @ 2020-05-26 11:41 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, v.shpilevoy

* Serge Petrenko <sergepetrenko@tarantool.org> [20/05/26 12:42]:
> 
> 25.05.2020 19:34, Konstantin Osipov пишет:
> > * Serge Petrenko <sergepetrenko@tarantool.org> [20/05/25 14:00]:
> > 
> > While I think it's an acceptable strategy for a bugfix, I still
> > think relaying has to be fixed to be transactional, the current
> > xstream api is a huge legacy we're stuck with since Tarantool 1.3!
> > 
> > It saddens me a great deal the relay patch may be shuffled into
> > the desk once the problem is not as urgent.
> Hi! Thanks for your  answer.
> I understand your concern with transactional relay rework. However, this
> approach faces some non-trivial problems we discussed verbally. That's why
> I chose a more simple solution.
> 
> 
> I have a question regarding this patch. Do you think I should reorder the tx
> rows
> so that the last row is a global one? Maybe it'd be better to just set
> is_commit
> flag on the last global row? This breaks tx boundaries in local WAL, but
> preserves
> them as is in replication and in other instances' WALs. I don't think we'll
> need tx
> boundariesin local WAL anyway, but still breaking the invariant that the
> last tx
> row has an is_commit flagset to true is scary.

We will eventually need tx boundaries everywhere, it's just a
rakes waiting to be stepped on that we added transaction
boundaries but did not enforce them everywhere.

If you set is_commit flag on the last global row, you will exclude
local rows from the transaction locally. 
Reordering, but more specifically, simply moving the last global
xrow to the end of the transaction list, is safer IMHO.


-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-26 11:41       ` Konstantin Osipov
@ 2020-05-26 12:08         ` Serge Petrenko
  0 siblings, 0 replies; 26+ messages in thread
From: Serge Petrenko @ 2020-05-26 12:08 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, v.shpilevoy


26.05.2020 14:41, Konstantin Osipov пишет:
> * Serge Petrenko <sergepetrenko@tarantool.org> [20/05/26 12:42]:
>> 25.05.2020 19:34, Konstantin Osipov пишет:
>>> * Serge Petrenko <sergepetrenko@tarantool.org> [20/05/25 14:00]:
>>>
>>> While I think it's an acceptable strategy for a bugfix, I still
>>> think relaying has to be fixed to be transactional, the current
>>> xstream api is a huge legacy we're stuck with since Tarantool 1.3!
>>>
>>> It saddens me a great deal the relay patch may be shuffled into
>>> the desk once the problem is not as urgent.
>> Hi! Thanks for your  answer.
>> I understand your concern with transactional relay rework. However, this
>> approach faces some non-trivial problems we discussed verbally. That's why
>> I chose a more simple solution.
>>
>>
>> I have a question regarding this patch. Do you think I should reorder the tx
>> rows
>> so that the last row is a global one? Maybe it'd be better to just set
>> is_commit
>> flag on the last global row? This breaks tx boundaries in local WAL, but
>> preserves
>> them as is in replication and in other instances' WALs. I don't think we'll
>> need tx
>> boundariesin local WAL anyway, but still breaking the invariant that the
>> last tx
>> row has an is_commit flagset to true is scary.
> We will eventually need tx boundaries everywhere, it's just a
> rakes waiting to be stepped on that we added transaction
> boundaries but did not enforce them everywhere.
>
> If you set is_commit flag on the last global row, you will exclude
> local rows from the transaction locally.
> Reordering, but more specifically, simply moving the last global
> xrow to the end of the transaction list, is safer IMHO.
>
Ok, thanks.

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 0/2] fix replication tx boundaries after local space rework
  2020-05-25 10:58 [Tarantool-patches] [PATCH v2 0/2] fix replication tx boundaries after local space rework Serge Petrenko
  2020-05-25 10:58 ` [Tarantool-patches] [PATCH v2 1/2] wal: fix tx boundaries Serge Petrenko
  2020-05-25 10:58 ` [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row Serge Petrenko
@ 2020-05-28 22:53 ` Vladislav Shpilevoy
  2020-05-29 11:03   ` Serge Petrenko
  2 siblings, 1 reply; 26+ messages in thread
From: Vladislav Shpilevoy @ 2020-05-28 22:53 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov, kostja.osipov; +Cc: tarantool-patches

Hi! Thanks for the patchset!

I am guessing the patchset is on the branch sp/gh-4928-tx-boundary-fix-alt.
Is it correct?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 1/2] wal: fix tx boundaries
  2020-05-25 10:58 ` [Tarantool-patches] [PATCH v2 1/2] wal: fix tx boundaries Serge Petrenko
@ 2020-05-28 22:53   ` Vladislav Shpilevoy
  2020-05-29 11:09     ` Serge Petrenko
  0 siblings, 1 reply; 26+ messages in thread
From: Vladislav Shpilevoy @ 2020-05-28 22:53 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov, kostja.osipov; +Cc: tarantool-patches

Thanks for the patch!

> diff --git a/src/box/wal.c b/src/box/wal.c
> index b979244e3..ef4d84920 100644
> --- a/src/box/wal.c
> +++ b/src/box/wal.c
> @@ -956,25 +956,37 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>  	       struct xrow_header **end)
>  {
>  	int64_t tsn = 0;
> +	struct xrow_header **start = row;
> +	struct xrow_header **first_glob_row = row;
>  	/** Assign LSN to all local rows. */
>  	for ( ; row < end; row++) {
>  		if ((*row)->replica_id == 0) {
>  			/*
>  			 * All rows representing local space data
> -			 * manipulations are signed wth a zero
> +			 * manipulations are signed with a zero
>  			 * instance id. This is also true for
>  			 * anonymous replicas, since they are
>  			 * only capable of writing to local and
>  			 * temporary spaces.
>  			 */
> -			if ((*row)->group_id != GROUP_LOCAL)
> +			if ((*row)->group_id != GROUP_LOCAL) {
>  				(*row)->replica_id = instance_id;
> +			}

Seems like accidental diff.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-25 10:58 ` [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row Serge Petrenko
  2020-05-25 15:13   ` Cyrill Gorcunov
  2020-05-25 16:34   ` Konstantin Osipov
@ 2020-05-28 22:54   ` Vladislav Shpilevoy
  2020-05-29  8:13     ` Konstantin Osipov
  2020-05-29 11:42     ` Serge Petrenko
  2 siblings, 2 replies; 26+ messages in thread
From: Vladislav Shpilevoy @ 2020-05-28 22:54 UTC (permalink / raw)
  To: Serge Petrenko, gorcunov, kostja.osipov; +Cc: tarantool-patches

Thanks   for   the   patch,
But   I   have  a  comment!
It   is   a   nice  crutch,
Yet we need one more moment.

The patch basically sacrifices transaction rows order
and WAL correctness for the sake of replication. It
does not look right. Why can't we leave WAL as is, and
tweak all these things in relay? It looks really wrong
to change statements order. Especially taking into
account this is needed *only* for replication. For example,
consider FKs.

A local space has a FOREIGN KEY reference to a global
space. To make it work, we need to insert into the global
space first, and then into the local space. When you change
the order, the local insert goes first, and violates the
foreign key. So if we will check FKs on recovery (when we
will have FKs in box), this patch will break them.

Alternative to relay - append a dummy NOP statement in the
end of the transaction, which would be global. But this is
also a crutch. I think the TSNs figuring out should be done
in relay. It could keep track of the current transaction,
change TSNs and is_commit when necessary.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-28 22:54   ` Vladislav Shpilevoy
@ 2020-05-29  8:13     ` Konstantin Osipov
  2020-05-29 11:42     ` Serge Petrenko
  1 sibling, 0 replies; 26+ messages in thread
From: Konstantin Osipov @ 2020-05-29  8:13 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [20/05/29 01:55]:
> Alternative to relay - append a dummy NOP statement in the
> end of the transaction, which would be global. But this is
> also a crutch. I think the TSNs figuring out should be done
> in relay. It could keep track of the current transaction,
> change TSNs and is_commit when necessary.

I think it's a good idea actually.

Tweaking the relay is also a crutch.

After all I wonder, what's so difficult in changing xstream api? 
Why do we need to invent these crutches?

Thanks for pointing out the foreign key problem.

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 0/2] fix replication tx boundaries after local space rework
  2020-05-28 22:53 ` [Tarantool-patches] [PATCH v2 0/2] fix replication tx boundaries after local space rework Vladislav Shpilevoy
@ 2020-05-29 11:03   ` Serge Petrenko
  0 siblings, 0 replies; 26+ messages in thread
From: Serge Petrenko @ 2020-05-29 11:03 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches


29.05.2020 01:53, Vladislav Shpilevoy пишет:
> Hi! Thanks for the patchset!
>
> I am guessing the patchset is on the branch sp/gh-4928-tx-boundary-fix-alt.
> Is it correct?
Yes, you're correct. I forgot to attach the branch, sorry.

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 1/2] wal: fix tx boundaries
  2020-05-28 22:53   ` Vladislav Shpilevoy
@ 2020-05-29 11:09     ` Serge Petrenko
  0 siblings, 0 replies; 26+ messages in thread
From: Serge Petrenko @ 2020-05-29 11:09 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches


29.05.2020 01:53, Vladislav Shpilevoy пишет:
> Thanks for the patch!
>
>> diff --git a/src/box/wal.c b/src/box/wal.c
>> index b979244e3..ef4d84920 100644
>> --- a/src/box/wal.c
>> +++ b/src/box/wal.c
>> @@ -956,25 +956,37 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
>>   	       struct xrow_header **end)
>>   {
>>   	int64_t tsn = 0;
>> +	struct xrow_header **start = row;
>> +	struct xrow_header **first_glob_row = row;
>>   	/** Assign LSN to all local rows. */
>>   	for ( ; row < end; row++) {
>>   		if ((*row)->replica_id == 0) {
>>   			/*
>>   			 * All rows representing local space data
>> -			 * manipulations are signed wth a zero
>> +			 * manipulations are signed with a zero
>>   			 * instance id. This is also true for
>>   			 * anonymous replicas, since they are
>>   			 * only capable of writing to local and
>>   			 * temporary spaces.
>>   			 */
>> -			if ((*row)->group_id != GROUP_LOCAL)
>> +			if ((*row)->group_id != GROUP_LOCAL) {
>>   				(*row)->replica_id = instance_id;
>> +			}
> Seems like accidental diff.

Thanks! Fixed.

diff --git a/src/box/wal.c b/src/box/wal.c
index ef4d84920..59587510a 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -969,9 +969,8 @@ wal_assign_lsn(struct vclock *vclock_diff, struct 
vclock *base,
                          * only capable of writing to local and
                          * temporary spaces.
                          */
-                       if ((*row)->group_id != GROUP_LOCAL) {
+                       if ((*row)->group_id != GROUP_LOCAL)
                                 (*row)->replica_id = instance_id;
-                       }

                         (*row)->lsn = vclock_inc(vclock_diff, 
(*row)->replica_id) +
                                       vclock_get(base, (*row)->replica_id);

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-28 22:54   ` Vladislav Shpilevoy
  2020-05-29  8:13     ` Konstantin Osipov
@ 2020-05-29 11:42     ` Serge Petrenko
  2020-05-29 11:51       ` Konstantin Osipov
  2020-06-01 13:40       ` Vladislav Shpilevoy
  1 sibling, 2 replies; 26+ messages in thread
From: Serge Petrenko @ 2020-05-29 11:42 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches


29.05.2020 01:54, Vladislav Shpilevoy пишет:
> Thanks   for   the   patch,
> But   I   have  a  comment!
> It   is   a   nice  crutch,
> Yet we need one more moment.
>
> The patch basically sacrifices transaction rows order
> and WAL correctness for the sake of replication. It
> does not look right. Why can't we leave WAL as is, and
> tweak all these things in relay? It looks really wrong
> to change statements order. Especially taking into
> account this is needed *only* for replication. For example,
> consider FKs.
>
> A local space has a FOREIGN KEY reference to a global
> space. To make it work, we need to insert into the global
> space first, and then into the local space. When you change
> the order, the local insert goes first, and violates the
> foreign key. So if we will check FKs on recovery (when we
> will have FKs in box), this patch will break them.
>
> Alternative to relay - append a dummy NOP statement in the
> end of the transaction, which would be global. But this is
> also a crutch. I think the TSNs figuring out should be done
> in relay. It could keep track of the current transaction,
> change TSNs and is_commit when necessary.

Hi! Thanks for the review!

I understand this is a crutch, but it's the best solution I could come
up with.  Appending dummy  NOPs will increase instance LSN by one,
which  also looks  ugly. The correct solution is, indeed, to collect  a tx
in relay and mangle with it in any means  we need before sending, however,
I faced some problems with this approach. See more in v1 of this patchset
(letter [PATCH 2/2] replication: make relay send txs in batches).

By the way, your txn_limbo implementation assigns the tx an lsn of the last
tx row, which doesn't work in case the last row is a local one. So, 
looks like
we either need to reorder the rows or insert a NOP at the tx end. Or just
assign the tx an lsn of the last global row.

Still, it's up to you which solution you find a better one, and I'll
implement it.

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-29 11:42     ` Serge Petrenko
@ 2020-05-29 11:51       ` Konstantin Osipov
  2020-05-29 12:07         ` Cyrill Gorcunov
  2020-05-29 12:15         ` Serge Petrenko
  2020-06-01 13:40       ` Vladislav Shpilevoy
  1 sibling, 2 replies; 26+ messages in thread
From: Konstantin Osipov @ 2020-05-29 11:51 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, Vladislav Shpilevoy

* Serge Petrenko <sergepetrenko@tarantool.org> [20/05/29 14:44]:
> I understand this is a crutch, but it's the best solution I could come
> up with.  Appending dummy  NOPs will increase instance LSN by one,
> which  also looks  ugly. The correct solution is, indeed, to collect  a tx
> in relay and mangle with it in any means  we need before sending, however,
> I faced some problems with this approach. See more in v1 of this patchset
> (letter [PATCH 2/2] replication: make relay send txs in batches).
> 

What were the issues changing xstream api apart from never trying?


-- 
Konstantin Osipov, Moscow, Russia
https://scylladb.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-29 11:51       ` Konstantin Osipov
@ 2020-05-29 12:07         ` Cyrill Gorcunov
  2020-05-29 12:07           ` Cyrill Gorcunov
  2020-05-29 12:15         ` Serge Petrenko
  1 sibling, 1 reply; 26+ messages in thread
From: Cyrill Gorcunov @ 2020-05-29 12:07 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Vladislav Shpilevoy

On Fri, May 29, 2020 at 02:51:54PM +0300, Konstantin Osipov wrote:
> * Serge Petrenko <sergepetrenko@tarantool.org> [20/05/29 14:44]:
> > I understand this is a crutch, but it's the best solution I could come
> > up with.  Appending dummy  NOPs will increase instance LSN by one,
> > which  also looks  ugly. The correct solution is, indeed, to collect  a tx
> > in relay and mangle with it in any means  we need before sending, however,
> > I faced some problems with this approach. See more in v1 of this patchset
> > (letter [PATCH 2/2] replication: make relay send txs in batches).
> > 
> 
> What were the issues changing xstream api apart from never trying?

Just for info -- I've filed a bug for this, didn't find time to look
deeper though.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-29 12:07         ` Cyrill Gorcunov
@ 2020-05-29 12:07           ` Cyrill Gorcunov
  0 siblings, 0 replies; 26+ messages in thread
From: Cyrill Gorcunov @ 2020-05-29 12:07 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Vladislav Shpilevoy

On Fri, May 29, 2020 at 03:07:04PM +0300, Cyrill Gorcunov wrote:
> > 
> > What were the issues changing xstream api apart from never trying?
> 
> Just for info -- I've filed a bug for this, didn't find time to look
> deeper though.

https://github.com/tarantool/tarantool/issues/5021

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-29 11:51       ` Konstantin Osipov
  2020-05-29 12:07         ` Cyrill Gorcunov
@ 2020-05-29 12:15         ` Serge Petrenko
  2020-05-29 13:44           ` Konstantin Osipov
  1 sibling, 1 reply; 26+ messages in thread
From: Serge Petrenko @ 2020-05-29 12:15 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, Vladislav Shpilevoy


29.05.2020 14:51, Konstantin Osipov пишет:
> * Serge Petrenko <sergepetrenko@tarantool.org> [20/05/29 14:44]:
>> I understand this is a crutch, but it's the best solution I could come
>> up with.  Appending dummy  NOPs will increase instance LSN by one,
>> which  also looks  ugly. The correct solution is, indeed, to collect  a tx
>> in relay and mangle with it in any means  we need before sending, however,
>> I faced some problems with this approach. See more in v1 of this patchset
>> (letter [PATCH 2/2] replication: make relay send txs in batches).
>>
> What were the issues changing xstream api apart from never trying?

No issues, it's just a big change. At least it looks like there're no
issues until I try the approach.

>
>
-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-29 12:15         ` Serge Petrenko
@ 2020-05-29 13:44           ` Konstantin Osipov
  2020-05-29 15:55             ` Serge Petrenko
  0 siblings, 1 reply; 26+ messages in thread
From: Konstantin Osipov @ 2020-05-29 13:44 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, Vladislav Shpilevoy

* Serge Petrenko <sergepetrenko@tarantool.org> [20/05/29 15:20]:
> No issues, it's just a big change. At least it looks like there're no
> issues until I try the approach.

Why do you think it's a big change if you didn't try?

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-29 13:44           ` Konstantin Osipov
@ 2020-05-29 15:55             ` Serge Petrenko
  0 siblings, 0 replies; 26+ messages in thread
From: Serge Petrenko @ 2020-05-29 15:55 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, Vladislav Shpilevoy


29.05.2020 16:44, Konstantin Osipov пишет:
> * Serge Petrenko <sergepetrenko@tarantool.org> [20/05/29 15:20]:
>> No issues, it's just a big change. At least it looks like there're no
>> issues until I try the approach.
> Why do you think it's a big change if you didn't try?

Looks like it is. It's definitely bigger than the crutches we spoke of 
above.

Anyway, let's wait for Vlad's answer and see what we shall do next.

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-05-29 11:42     ` Serge Petrenko
  2020-05-29 11:51       ` Konstantin Osipov
@ 2020-06-01 13:40       ` Vladislav Shpilevoy
  2020-06-01 16:02         ` Sergey Ostanevich
  1 sibling, 1 reply; 26+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-01 13:40 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches

On 29/05/2020 13:42, Serge Petrenko wrote:
> 
> 29.05.2020 01:54, Vladislav Shpilevoy пишет:
>> Thanks   for   the   patch,
>> But   I   have  a  comment!
>> It   is   a   nice  crutch,
>> Yet we need one more moment.
>>
>> The patch basically sacrifices transaction rows order
>> and WAL correctness for the sake of replication. It
>> does not look right. Why can't we leave WAL as is, and
>> tweak all these things in relay? It looks really wrong
>> to change statements order. Especially taking into
>> account this is needed *only* for replication. For example,
>> consider FKs.
>>
>> A local space has a FOREIGN KEY reference to a global
>> space. To make it work, we need to insert into the global
>> space first, and then into the local space. When you change
>> the order, the local insert goes first, and violates the
>> foreign key. So if we will check FKs on recovery (when we
>> will have FKs in box), this patch will break them.
>>
>> Alternative to relay - append a dummy NOP statement in the
>> end of the transaction, which would be global. But this is
>> also a crutch. I think the TSNs figuring out should be done
>> in relay. It could keep track of the current transaction,
>> change TSNs and is_commit when necessary.
> 
> Hi! Thanks for the review!
> 
> I understand this is a crutch, but it's the best solution I could come
> up with.  Appending dummy  NOPs will increase instance LSN by one,
> which  also looks  ugly. The correct solution is, indeed, to collect  a tx
> in relay and mangle with it in any means  we need before sending, however,
> I faced some problems with this approach. See more in v1 of this patchset
> (letter [PATCH 2/2] replication: make relay send txs in batches).
> 
> By the way, your txn_limbo implementation assigns the tx an lsn of the last
> tx row, which doesn't work in case the last row is a local one. So, looks like
> we either need to reorder the rows or insert a NOP at the tx end. Or just
> assign the tx an lsn of the last global row.
> 
> Still, it's up to you which solution you find a better one, and I'll
> implement it.

I guess the current solution is ok then. Feel free to add
Reviewed-by: Vladislav Shpilevoy <vshpilevoi@mail.ru>. First version
looks better, but seems it will take too much time to rework xlog
streaming API.

Although did you try? I don't know xlog code well, but perhaps you could
just forbid to reuse ibuf for sequential rows until explicit reset() or
something.

I also have a thought about is_commit necessity. Why do we need it? Why isn't
tsn enough? I saw in xrow code (xrow_header_encode()) that there are 2
combinations:

- No tsn, and no is_commit for single statement transactions. Is_commit
  is assumed implicitly;

- tsn is encoded, and the last statement has is_commit.

But in fact the is_commit flag always can be determined implicitly. Consider
all the possible cases:

- row1: tsn1, row2: tsn2. Then all rows [..., row1] belong to one
  transaction, all rows [row2, ...] belong to a next transaction;

- row1: no tsn, row2: tsn2. Then row1 is first single-statement
  transaction, [row2, ...] - next transaction.

- row1: tsn1, row2: no tsn. Then the same as previous. [..., row1] -
  first transaction, row2 - second single-statement transaction.

- row1: no tsn, row2: no tsn. Two single statement transactions.

It means, transaction border can be determined by tsn difference
between two sequential rows. This is allowed because transactions
are never mixed. Tsn difference is border. And since is_rollback
does not exist (it will in sync replication, but as a separate entry,
for multiple transactions), and is_commit *already* is assumed
implicitly for single statement transactions, why not to calculate it
implicitly for all transactions?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-06-01 13:40       ` Vladislav Shpilevoy
@ 2020-06-01 16:02         ` Sergey Ostanevich
  2020-06-01 17:06           ` Vladislav Shpilevoy
  0 siblings, 1 reply; 26+ messages in thread
From: Sergey Ostanevich @ 2020-06-01 16:02 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Hi!


On 01 июн 15:40, Vladislav Shpilevoy wrote:
> On 29/05/2020 13:42, Serge Petrenko wrote:
> > 
> > 29.05.2020 01:54, Vladislav Shpilevoy пишет:
> >> Thanks   for   the   patch,
> >> But   I   have  a  comment!
> >> It   is   a   nice  crutch,
> >> Yet we need one more moment.
> >>
> >> The patch basically sacrifices transaction rows order
> >> and WAL correctness for the sake of replication. It
> >> does not look right. Why can't we leave WAL as is, and
> >> tweak all these things in relay? It looks really wrong
> >> to change statements order. Especially taking into
> >> account this is needed *only* for replication. For example,
> >> consider FKs.
> >>
> >> A local space has a FOREIGN KEY reference to a global
> >> space. To make it work, we need to insert into the global
> >> space first, and then into the local space. When you change
> >> the order, the local insert goes first, and violates the
> >> foreign key. So if we will check FKs on recovery (when we
> >> will have FKs in box), this patch will break them.
> >>
> >> Alternative to relay - append a dummy NOP statement in the
> >> end of the transaction, which would be global. But this is
> >> also a crutch. I think the TSNs figuring out should be done
> >> in relay. It could keep track of the current transaction,
> >> change TSNs and is_commit when necessary.
> > 
> > Hi! Thanks for the review!
> > 
> > I understand this is a crutch, but it's the best solution I could come
> > up with.  Appending dummy  NOPs will increase instance LSN by one,
> > which  also looks  ugly. The correct solution is, indeed, to collect  a tx
> > in relay and mangle with it in any means  we need before sending, however,
> > I faced some problems with this approach. See more in v1 of this patchset
> > (letter [PATCH 2/2] replication: make relay send txs in batches).
> > 
> > By the way, your txn_limbo implementation assigns the tx an lsn of the last
> > tx row, which doesn't work in case the last row is a local one. So, looks like
> > we either need to reorder the rows or insert a NOP at the tx end. Or just
> > assign the tx an lsn of the last global row.
> > 
> > Still, it's up to you which solution you find a better one, and I'll
> > implement it.
> 
> I guess the current solution is ok then. Feel free to add
> Reviewed-by: Vladislav Shpilevoy <vshpilevoi@mail.ru>. First version
> looks better, but seems it will take too much time to rework xlog
> streaming API.
> 
> Although did you try? I don't know xlog code well, but perhaps you could
> just forbid to reuse ibuf for sequential rows until explicit reset() or
> something.
> 
> I also have a thought about is_commit necessity. Why do we need it? Why isn't
> tsn enough? I saw in xrow code (xrow_header_encode()) that there are 2
> combinations:
> 
> - No tsn, and no is_commit for single statement transactions. Is_commit
>   is assumed implicitly;
> 
> - tsn is encoded, and the last statement has is_commit.
> 
> But in fact the is_commit flag always can be determined implicitly. Consider
> all the possible cases:
> 
> - row1: tsn1, row2: tsn2. Then all rows [..., row1] belong to one
>   transaction, all rows [row2, ...] belong to a next transaction;
> 
> - row1: no tsn, row2: tsn2. Then row1 is first single-statement
>   transaction, [row2, ...] - next transaction.
> 
> - row1: tsn1, row2: no tsn. Then the same as previous. [..., row1] -
>   first transaction, row2 - second single-statement transaction.
> 
> - row1: no tsn, row2: no tsn. Two single statement transactions.
> 
> It means, transaction border can be determined by tsn difference
> between two sequential rows. This is allowed because transactions
> are never mixed. Tsn difference is border. And since is_rollback
> does not exist (it will in sync replication, but as a separate entry,
> for multiple transactions), and is_commit *already* is assumed
> implicitly for single statement transactions, why not to calculate it
> implicitly for all transactions?

Shall we need a readahead then?
Consider you have a bactch arrived and you have to guess if the last op
in it actually is_commit? 
What if you have no following ops for a long time? Will you hold the txn
until the next one arrived?

Sergos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row
  2020-06-01 16:02         ` Sergey Ostanevich
@ 2020-06-01 17:06           ` Vladislav Shpilevoy
  0 siblings, 0 replies; 26+ messages in thread
From: Vladislav Shpilevoy @ 2020-06-01 17:06 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches

>>> 29.05.2020 01:54, Vladislav Shpilevoy пишет:
>>>> Thanks   for   the   patch,
>>>> But   I   have  a  comment!
>>>> It   is   a   nice  crutch,
>>>> Yet we need one more moment.
>>>>
>>>> The patch basically sacrifices transaction rows order
>>>> and WAL correctness for the sake of replication. It
>>>> does not look right. Why can't we leave WAL as is, and
>>>> tweak all these things in relay? It looks really wrong
>>>> to change statements order. Especially taking into
>>>> account this is needed *only* for replication. For example,
>>>> consider FKs.
>>>>
>>>> A local space has a FOREIGN KEY reference to a global
>>>> space. To make it work, we need to insert into the global
>>>> space first, and then into the local space. When you change
>>>> the order, the local insert goes first, and violates the
>>>> foreign key. So if we will check FKs on recovery (when we
>>>> will have FKs in box), this patch will break them.
>>>>
>>>> Alternative to relay - append a dummy NOP statement in the
>>>> end of the transaction, which would be global. But this is
>>>> also a crutch. I think the TSNs figuring out should be done
>>>> in relay. It could keep track of the current transaction,
>>>> change TSNs and is_commit when necessary.
>>>
>>> Hi! Thanks for the review!
>>>
>>> I understand this is a crutch, but it's the best solution I could come
>>> up with.  Appending dummy  NOPs will increase instance LSN by one,
>>> which  also looks  ugly. The correct solution is, indeed, to collect  a tx
>>> in relay and mangle with it in any means  we need before sending, however,
>>> I faced some problems with this approach. See more in v1 of this patchset
>>> (letter [PATCH 2/2] replication: make relay send txs in batches).
>>>
>>> By the way, your txn_limbo implementation assigns the tx an lsn of the last
>>> tx row, which doesn't work in case the last row is a local one. So, looks like
>>> we either need to reorder the rows or insert a NOP at the tx end. Or just
>>> assign the tx an lsn of the last global row.
>>>
>>> Still, it's up to you which solution you find a better one, and I'll
>>> implement it.
>>
>> I guess the current solution is ok then. Feel free to add
>> Reviewed-by: Vladislav Shpilevoy <vshpilevoi@mail.ru>. First version
>> looks better, but seems it will take too much time to rework xlog
>> streaming API.
>>
>> Although did you try? I don't know xlog code well, but perhaps you could
>> just forbid to reuse ibuf for sequential rows until explicit reset() or
>> something.
>>
>> I also have a thought about is_commit necessity. Why do we need it? Why isn't
>> tsn enough? I saw in xrow code (xrow_header_encode()) that there are 2
>> combinations:
>>
>> - No tsn, and no is_commit for single statement transactions. Is_commit
>>   is assumed implicitly;
>>
>> - tsn is encoded, and the last statement has is_commit.
>>
>> But in fact the is_commit flag always can be determined implicitly. Consider
>> all the possible cases:
>>
>> - row1: tsn1, row2: tsn2. Then all rows [..., row1] belong to one
>>   transaction, all rows [row2, ...] belong to a next transaction;
>>
>> - row1: no tsn, row2: tsn2. Then row1 is first single-statement
>>   transaction, [row2, ...] - next transaction.
>>
>> - row1: tsn1, row2: no tsn. Then the same as previous. [..., row1] -
>>   first transaction, row2 - second single-statement transaction.
>>
>> - row1: no tsn, row2: no tsn. Two single statement transactions.
>>
>> It means, transaction border can be determined by tsn difference
>> between two sequential rows. This is allowed because transactions
>> are never mixed. Tsn difference is border. And since is_rollback
>> does not exist (it will in sync replication, but as a separate entry,
>> for multiple transactions), and is_commit *already* is assumed
>> implicitly for single statement transactions, why not to calculate it
>> implicitly for all transactions?
> 
> Shall we need a readahead then?
> Consider you have a bactch arrived and you have to guess if the last op
> in it actually is_commit? 
> What if you have no following ops for a long time? Will you hold the txn
> until the next one arrived?
> 
> Sergos

I didn't think about it. Then the flag removal can't be done. At least
not in the way I described it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-06-01 17:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25 10:58 [Tarantool-patches] [PATCH v2 0/2] fix replication tx boundaries after local space rework Serge Petrenko
2020-05-25 10:58 ` [Tarantool-patches] [PATCH v2 1/2] wal: fix tx boundaries Serge Petrenko
2020-05-28 22:53   ` Vladislav Shpilevoy
2020-05-29 11:09     ` Serge Petrenko
2020-05-25 10:58 ` [Tarantool-patches] [PATCH v2 2/2] wal: reorder tx rows so that a tx ends on a global row Serge Petrenko
2020-05-25 15:13   ` Cyrill Gorcunov
2020-05-25 16:34   ` Konstantin Osipov
2020-05-25 18:35     ` Cyrill Gorcunov
2020-05-25 20:42       ` Konstantin Osipov
2020-05-26  9:41     ` Serge Petrenko
2020-05-26 11:41       ` Konstantin Osipov
2020-05-26 12:08         ` Serge Petrenko
2020-05-28 22:54   ` Vladislav Shpilevoy
2020-05-29  8:13     ` Konstantin Osipov
2020-05-29 11:42     ` Serge Petrenko
2020-05-29 11:51       ` Konstantin Osipov
2020-05-29 12:07         ` Cyrill Gorcunov
2020-05-29 12:07           ` Cyrill Gorcunov
2020-05-29 12:15         ` Serge Petrenko
2020-05-29 13:44           ` Konstantin Osipov
2020-05-29 15:55             ` Serge Petrenko
2020-06-01 13:40       ` Vladislav Shpilevoy
2020-06-01 16:02         ` Sergey Ostanevich
2020-06-01 17:06           ` Vladislav Shpilevoy
2020-05-28 22:53 ` [Tarantool-patches] [PATCH v2 0/2] fix replication tx boundaries after local space rework Vladislav Shpilevoy
2020-05-29 11:03   ` Serge Petrenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox