From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp53.i.mail.ru (smtp53.i.mail.ru [94.100.177.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id EAEAE4696C3 for ; Tue, 7 Apr 2020 18:48:56 +0300 (MSK) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) From: Serge Petrenko In-Reply-To: Date: Tue, 7 Apr 2020 18:48:56 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: Subject: Re: [Tarantool-patches] [PATCH v5 4/4] box: start counting local space requests separately List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org > 4 =D0=B0=D0=BF=D1=80. 2020 =D0=B3., =D0=B2 23:51, Vladislav Shpilevoy = =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0= =B0): >=20 > Thanks for the patch! Thanks for the review! >=20 > See 3 comments below. >=20 >> diff --git a/src/box/box.cc b/src/box/box.cc >> index bf95d1b5e..0762266b0 100644 >> --- a/src/box/box.cc >> +++ b/src/box/box.cc >> @@ -1879,6 +1879,23 @@ box_process_subscribe(struct ev_io *io, struct = xrow_header *header) >> say_info("remote vclock %s local vclock %s", >> vclock_to_string(&replica_clock), = vclock_to_string(&vclock)); >>=20 >> + /* >> + * Replica clock is used in gc state and recovery >> + * initialization, so we need to replace the remote 0-th >> + * component with our own one. This doesn't break >> + * recovery: it finds the WAL with a vclock strictly less >> + * than replia clock in all components except the 0th one. >> + * This leads to finding the correct WAL, if it exists, >> + * since we do not need to recover local rows (the ones, >> + * that contribute to the 0-th vclock component). >> + * Note, that it would be bad to set 0-th vclock component >> + * to a smaller value, since it would unnecessarily >> + * require additional WALs, which may have already been >> + * deleted. >=20 > 1. But you just said above, that 0 component is ignored in all > comparators anyway. So what is a problem of setting 0 component > to 0? I didn=E2=80=99t. vclock[0] is ignored in gc_consumer objects now, but when initializing recovery vclock[0] cannot be ignored. Otherwise the instance itself would fail to recover local rows on local recovery. So relay recovery needs to be initialized with some valid vclock[0] value. Valid means =C2=ABbig enough to not require wals that have already been deleted=C2=BB >=20 >> + * Speaking of gc, remote instances' local vclock >> + * components are not used by consumers at all. >> + */ >> + vclock_reset(&replica_clock, 0, vclock_get(&replicaset.vclock, = 0)); >> diff --git a/src/box/wal.c b/src/box/wal.c >> index 3b094b0e8..a74bdecd9 100644 >> --- a/src/box/wal.c >> +++ b/src/box/wal.c >> @@ -953,13 +953,19 @@ wal_assign_lsn(struct vclock *vclock_diff, = struct vclock *base, >> /** Assign LSN to all local rows. */ >> for ( ; row < end; row++) { >> if ((*row)->replica_id =3D=3D 0) { >> - (*row)->lsn =3D vclock_inc(vclock_diff, = instance_id) + >> - vclock_get(base, instance_id); >> /* >> - * Note, an anonymous replica signs local >> - * rows whith a zero instance id. >> + * All rows representing local space data >> + * manipulations are signed wth a zero >> + * instance id. This is also true for >> + * anonymous replicas, since they are >> + * only capable of writing to local and >> + * temporary spaces. >> */ >> - (*row)->replica_id =3D instance_id; >> + if ((*row)->group_id !=3D GROUP_LOCAL) >> + (*row)->replica_id =3D instance_id; >> + >> + (*row)->lsn =3D vclock_inc(vclock_diff, = (*row)->replica_id) + >> + vclock_get(base, = (*row)->replica_id); >=20 > 2. Bad indentation. vlock_get() should be aligned under vclock_inc(). Sorry, my bad. Fixed. >=20 >> /* Use lsn of the first local row as transaction = id. */ >> tsn =3D tsn =3D=3D 0 ? (*row)->lsn : tsn; >> (*row)->tsn =3D tsn; >=20 > 3. I run the new test without your patch, and it just hangs. No > xloggaperror like in the ticket. Did it disappear thanks to the > previous commit, about GC? Hmm, that=E2=80=99s an interesting bug (or a peculiarity?) After deleting the old xlog replica is left with only one .snap file, and doesn=E2=80=99t produce an error regarding =C2=ABno wal between lsn = x and y=C2=BB. Master is left hanging until replica inserts some new data, and opens a WAL which cannot be recovered from. I worked it around. diff --git a/src/box/wal.c b/src/box/wal.c index a74bdecd9..1eb20272c 100644 --- a/src/box/wal.c +++ b/src/box/wal.c @@ -965,7 +965,7 @@ wal_assign_lsn(struct vclock *vclock_diff, struct = vclock *base, (*row)->replica_id =3D instance_id; =20 (*row)->lsn =3D vclock_inc(vclock_diff, = (*row)->replica_id) + - vclock_get(base, = (*row)->replica_id); + vclock_get(base, = (*row)->replica_id); /* Use lsn of the first local row as transaction = id. */ tsn =3D tsn =3D=3D 0 ? (*row)->lsn : tsn; (*row)->tsn =3D tsn; diff --git a/test/replication/gh-4114-local-space-replication.result = b/test/replication/gh-4114-local-space-replication.result index e524c9a1b..4a71bb6dc 100644 --- a/test/replication/gh-4114-local-space-replication.result +++ b/test/replication/gh-4114-local-space-replication.result @@ -76,10 +76,7 @@ box.space.test:insert{3} | --- | - [3] | ... -box.snapshot() - | --- - | - ok - | ... + =20 box.info.vclock[0] | --- diff --git a/test/replication/gh-4114-local-space-replication.test.lua = b/test/replication/gh-4114-local-space-replication.test.lua index 26dccee68..114c592c5 100644 --- a/test/replication/gh-4114-local-space-replication.test.lua +++ b/test/replication/gh-4114-local-space-replication.test.lua @@ -26,7 +26,7 @@ box.snapshot() box.space.test:insert{2} box.snapshot() box.space.test:insert{3} -box.snapshot() + =20 box.info.vclock[0] =20 -- Serge Petrenko sergepetrenko@tarantool.org