Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v2] replication: fix join vclock obtainment in relay_initial_join
@ 2019-10-28 12:25 Ilya Kosarev
  2019-10-28 21:09 ` Vladislav Shpilevoy
  0 siblings, 1 reply; 4+ messages in thread
From: Ilya Kosarev @ 2019-10-28 12:25 UTC (permalink / raw)
  To: tarantool-patches; +Cc: tarantool-patches, v.shpilevoy

join_vclock test could fail on huge load due to vclock advance
comparing to an actual WAL. Now we are clearly obtaining vclock on the
flushed state using wal_begin_checkpoint. It is also better to get max
index and index count in single request in join_vclock test. With fixes
mentioned above it is not fragile anymore.

Closes #4160
---
https://github.com/tarantool/tarantool/tree/i.kosarev/gh-4160-fix-join-vclock
https://github.com/tarantool/tarantool/issues/4160

 src/box/relay.cc                      | 9 +++++----
 test/replication/join_vclock.result   | 5 +----
 test/replication/join_vclock.test.lua | 3 +--
 test/replication/suite.ini            | 1 -
 4 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/src/box/relay.cc b/src/box/relay.cc
index e849fcf4f..d16cd1a56 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -306,13 +306,14 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock)
 	});
 
 	/*
-	 * Sync WAL to make sure that all changes visible from
-	 * the frozen read view are successfully committed.
+	 * Make sure that current database state is flushed to
+	 * the WAL and obtain corresponding vclock.
 	 */
-	if (wal_sync() != 0)
+	struct wal_checkpoint checkpoint;
+	if (wal_begin_checkpoint(&checkpoint) != 0)
 		diag_raise();
 
-	vclock_copy(vclock, &replicaset.vclock);
+	vclock_copy(vclock, &checkpoint.vclock);
 
 	/* Respond to the JOIN request with the current vclock. */
 	struct xrow_header row;
diff --git a/test/replication/join_vclock.result b/test/replication/join_vclock.result
index a9781073d..d6d9af783 100644
--- a/test/replication/join_vclock.result
+++ b/test/replication/join_vclock.result
@@ -67,10 +67,7 @@ test_run:cmd("switch replica1")
 ---
 - true
 ...
-cnt = box.space.test.index[0]:count()
----
-...
-box.space.test.index.primary:max()[1] == cnt - 1
+box.space.test.index.primary:max()[1] == box.space.test.index.primary:count() - 1
 ---
 - true
 ...
diff --git a/test/replication/join_vclock.test.lua b/test/replication/join_vclock.test.lua
index 0b60dffc2..a813ba31f 100644
--- a/test/replication/join_vclock.test.lua
+++ b/test/replication/join_vclock.test.lua
@@ -26,8 +26,7 @@ ch:get()
 
 errinj.set("ERRINJ_RELAY_FINAL_SLEEP", false)
 test_run:cmd("switch replica1")
-cnt = box.space.test.index[0]:count()
-box.space.test.index.primary:max()[1] == cnt - 1
+box.space.test.index.primary:max()[1] == box.space.test.index.primary:count() - 1
 test_run:cmd("switch default")
 
 replica_set.drop_all(test_run)
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index 384dac677..ed1de3140 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -12,7 +12,6 @@ long_run = prune.test.lua
 is_parallel = True
 pretest_clean = True
 fragile = errinj.test.lua            ; gh-3870
-          join_vclock.test.lua       ; gh-4160
           long_row_timeout.test.lua  ; gh-4351
           skip_conflict_row.test.lua ; gh-4457
           sync.test.lua              ; gh-3835 gh-3877
-- 
2.17.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] replication: fix join vclock obtainment in relay_initial_join
  2019-10-28 12:25 [Tarantool-patches] [PATCH v2] replication: fix join vclock obtainment in relay_initial_join Ilya Kosarev
@ 2019-10-28 21:09 ` Vladislav Shpilevoy
  2019-10-28 21:10   ` Konstantin Osipov
  0 siblings, 1 reply; 4+ messages in thread
From: Vladislav Shpilevoy @ 2019-10-28 21:09 UTC (permalink / raw)
  To: Ilya Kosarev, Konstantin Osipov; +Cc: tarantool-patches

Hi! Thanks for the fixes!

> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index e849fcf4f..d16cd1a56 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -306,13 +306,14 @@ relay_initial_join(int fd, uint64_t sync, struct vclock *vclock)
>  	});
>  
>  	/*
> -	 * Sync WAL to make sure that all changes visible from
> -	 * the frozen read view are successfully committed.
> +	 * Make sure that current database state is flushed to
> +	 * the WAL and obtain corresponding vclock.
>  	 */
> -	if (wal_sync() != 0)
> +	struct wal_checkpoint checkpoint;
> +	if (wal_begin_checkpoint(&checkpoint) != 0)
>  		diag_raise();
>  
> -	vclock_copy(vclock, &replicaset.vclock);
> +	vclock_copy(vclock, &checkpoint.vclock);

It looks wrong, that we do begin_checkpoint() but omit
commit_checkpoint(). I would call commit too. Probably now
begin() does not allocate any resources, but it may start
in future, and we will have a leak here.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] replication: fix join vclock obtainment in relay_initial_join
  2019-10-28 21:09 ` Vladislav Shpilevoy
@ 2019-10-28 21:10   ` Konstantin Osipov
  2019-10-29 14:25     ` Ilya Kosarev
  0 siblings, 1 reply; 4+ messages in thread
From: Konstantin Osipov @ 2019-10-28 21:10 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy <v.shpilevoy@tarantool.org> [19/10/29 00:08]:
> It looks wrong, that we do begin_checkpoint() but omit
> commit_checkpoint(). I would call commit too. Probably now
> begin() does not allocate any resources, but it may start
> in future, and we will have a leak here.

This would be wrong, too. Wal uses commit_checkpoint() as a singal
that it can recycle wal logs. This is the code Georgy is moving to
GC (but his patch is not in yet).

One way to fix it would be to add vclock out parameter to
wal_sync().

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH v2] replication: fix join vclock obtainment in relay_initial_join
  2019-10-28 21:10   ` Konstantin Osipov
@ 2019-10-29 14:25     ` Ilya Kosarev
  0 siblings, 0 replies; 4+ messages in thread
From: Ilya Kosarev @ 2019-10-29 14:25 UTC (permalink / raw)
  To: Konstantin Osipov
  Cc: tarantool-patches, tarantool-patches, Vladislav Shpilevoy

[-- Attachment #1: Type: text/plain, Size: 895 bytes --]


Thanks!

It seems to be the best solution, solving the problem of ungrounded
wal_begin_checkpoint usage and in the same time providing direct vlock
obtainment. Sent v3 of the patch.

>Вторник, 29 октября 2019, 0:10 +03:00 от Konstantin Osipov < kostja.osipov@gmail.com >:
>
>* Vladislav Shpilevoy < v.shpilevoy@tarantool.org > [19/10/29 00:08]:
>> It looks wrong, that we do begin_checkpoint() but omit
>> commit_checkpoint(). I would call commit too. Probably now
>> begin() does not allocate any resources, but it may start
>> in future, and we will have a leak here.
>
>This would be wrong, too. Wal uses commit_checkpoint() as a singal
>that it can recycle wal logs. This is the code Georgy is moving to
>GC (but his patch is not in yet).
>
>One way to fix it would be to add vclock out parameter to
>wal_sync().
>
>-- 
>Konstantin Osipov, Moscow, Russia


-- 
Ilya Kosarev

[-- Attachment #2: Type: text/html, Size: 1905 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-10-29 14:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-28 12:25 [Tarantool-patches] [PATCH v2] replication: fix join vclock obtainment in relay_initial_join Ilya Kosarev
2019-10-28 21:09 ` Vladislav Shpilevoy
2019-10-28 21:10   ` Konstantin Osipov
2019-10-29 14:25     ` Ilya Kosarev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox