From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp33.i.mail.ru (smtp33.i.mail.ru [94.100.177.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 12368469719 for ; Tue, 18 Feb 2020 20:28:03 +0300 (MSK) From: Serge Petrenko Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_D359E63A-1393-439E-9590-46A2C6FE1B2F" Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.40.2.2.4\)) Date: Tue, 18 Feb 2020 20:28:02 +0300 In-Reply-To: References: <97f37279955ebc29e899164cc1364e6a0aea0f9b.1581630406.git.sergepetrenko@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v2 3/4] wal: wart when trying to write a record with a broken lsn List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy , Alexander Turenko , Konstantin Osipov Cc: tarantool-patches@dev.tarantool.org --Apple-Mail=_D359E63A-1393-439E-9590-46A2C6FE1B2F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi! Thanks for your review! Please find my answers below, together with an incremental diff. I=E2=80=99ll send v3 shortly. > 16 =D1=84=D0=B5=D0=B2=D1=80. 2020 =D0=B3., =D0=B2 19:15, Vladislav = Shpilevoy =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0= =D0=BB(=D0=B0): >=20 > Hi! Thanks for the patch! I will review other commits when > Kostja is fine with them. >=20 > Since he finished with this one, here is my nit: lets keep > assertion for the debug build. If not this assertion, we > probably wouldn't notice this bug, and may miss future bugs > without it. During running the tests we won't notice a > warning. >=20 > Also we probably should not call vclock_follow() at all, if > lsn is broken. Just keep it as it. It does not look right to Ok > decrease it. And make it panic() in vclock_follow() to catch > other bugs related to it. Kostja was against panicking on the previous review iteration, the assertion is still in `vclock_follow()` >=20 > On 13/02/2020 22:52, sergepetrenko wrote: >> From: Serge Petrenko >>=20 >> There is an assertion in vclock_follow `lsn > prev_lsn`, which = doesn't >> fire in release builds, of course. Let's at least warn the user on an >> attemt to write a record with a duplicate or otherwise broken lsn. >>=20 >> Follow-up #4739 >> --- >> src/box/wal.c | 15 ++++++++++++--- >> 1 file changed, 12 insertions(+), 3 deletions(-) >>=20 >> diff --git a/src/box/wal.c b/src/box/wal.c >> index 0ae66ff32..f8ee2b7d8 100644 >> --- a/src/box/wal.c >> +++ b/src/box/wal.c >> @@ -951,9 +951,18 @@ wal_assign_lsn(struct vclock *vclock_diff, = struct vclock *base, >> (*row)->tsn =3D tsn; >> (*row)->is_commit =3D row =3D=3D end - 1; >> } else { >> - vclock_follow(vclock_diff, (*row)->replica_id, >> - (*row)->lsn - vclock_get(base, >> - = (*row)->replica_id)); >> + int64_t diff =3D (*row)->lsn - vclock_get(base, = (*row)->replica_id); >> + if (diff <=3D vclock_get(vclock_diff, >> + (*row)->replica_id)) { >> + say_crit("Attempt to write a broken LSN = to WAL:" >> + " replica id: %d, committed = lsn: %d," >> + " new lsn %d", = (*row)->replica_id, >> + vclock_get(base, = (*row)->replica_id) + >> + vclock_get(vclock_diff, >> + (*row)->replica_id), >> + (*row)->lsn); >> + } >> + vclock_follow(vclock_diff, (*row)->replica_id, = diff); >=20 > On the summary, lets call follow() in 'else' branch, and add = unreachable() > after crit log. I believe `unreachable` doesn=E2=80=99t fit here, since it implies that = the code is truly unreachable, while we are trying to catch something that = =C2=ABshouldn=E2=80=99t happen=C2=BB. I=E2=80=99ve even seen a ticket in our repo regarding this misuse. Let=E2=80=99s just leave an `assert(0)` in this branch, unreachable is = defined like that anyway. Here are my changes: diff --git a/src/box/wal.c b/src/box/wal.c index f8ee2b7d8..a87aedf1d 100644 --- a/src/box/wal.c +++ b/src/box/wal.c @@ -955,14 +955,16 @@ wal_assign_lsn(struct vclock *vclock_diff, struct = vclock *base, if (diff <=3D vclock_get(vclock_diff, (*row)->replica_id)) { say_crit("Attempt to write a broken LSN = to WAL:" - " replica id: %d, committed = lsn: %d," + " replica id: %d, confirmed = lsn: %d," " new lsn %d", = (*row)->replica_id, vclock_get(base, = (*row)->replica_id) + vclock_get(vclock_diff, (*row)->replica_id), (*row)->lsn); + assert(0); + } else { + vclock_follow(vclock_diff, = (*row)->replica_id, diff); } - vclock_follow(vclock_diff, (*row)->replica_id, = diff); } } } -- Serge Petrenko sergepetrenko@tarantool.org --Apple-Mail=_D359E63A-1393-439E-9590-46A2C6FE1B2F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi! = Thanks for your review!

Please find my answers below, together with an incremental = diff.
I=E2=80=99ll send v3 shortly.

16 =D1=84=D0=B5=D0=B2=D1=80. 2020 =D0=B3., =D0=B2 19:15, = Vladislav Shpilevoy <v.shpilevoy@tarantool.org> = =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0):

Hi! Thanks for the patch! I will = review other commits when
Kostja is fine with them.

Since he finished with this one, here is my nit: lets = keep
assertion for = the debug build. If not this assertion, we
probably wouldn't notice this bug, and may miss future = bugs
without it. = During running the tests we won't notice a
warning.

Also we probably should not call vclock_follow() at all, = if
lsn is = broken. Just keep it as it. It does not look right to

Ok

decrease it. And make it panic() = in vclock_follow() to catch
other bugs related to it.

Kostja was against panicking on the previous = review iteration,
the assertion is still in = `vclock_follow()`


On 13/02/2020 22:52, sergepetrenko wrote:
From: = Serge Petrenko <sergepetrenko@tarantool.org>

There is an assertion in vclock_follow `lsn > prev_lsn`, = which doesn't
fire in release builds, of course. Let's at = least warn the user on an
attemt to write a record with a = duplicate or otherwise broken lsn.

Follow-up = #4739
---
src/box/wal.c | 15 = ++++++++++++---
1 file changed, 12 insertions(+), 3 = deletions(-)

diff --git a/src/box/wal.c = b/src/box/wal.c
index 0ae66ff32..f8ee2b7d8 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -951,9 +951,18 @@ wal_assign_lsn(struct vclock = *vclock_diff, struct vclock *base,
= (*row)->tsn =3D tsn;
= (*row)->is_commit =3D row =3D=3D end - 1;
} else = {
- vclock_follow(vclock_diff, (*row)->replica_id,
- = = = =       = ;(*row)->lsn - vclock_get(base,
-       = ; (*row)->replica_id));
+ int64_t = diff =3D (*row)->lsn - vclock_get(base, (*row)->replica_id);
+ = = = if (diff <=3D vclock_get(vclock_diff,
+       = ; (*row)->replica_id)) {
+ = say_crit("Attempt to write a broken LSN to WAL:"
+ = = = = =  " replica id: = %d, committed lsn: %d,"
+  " new lsn %d", = (*row)->replica_id,
+  vclock_get(base, = (*row)->replica_id) +
+  vclock_get(vclock_diff,
+ = = = = = =     (*row)->= replica_id),
+     (*row)->= lsn);
+ }
+ vclock_follow(vclock_diff, = (*row)->replica_id, diff);

On the summary, lets call = follow() in 'else' branch, and add unreachable()
after crit = log.

I = believe `unreachable` doesn=E2=80=99t fit here, since it implies that = the code is
truly unreachable, while we are trying to catch = something that =C2=ABshouldn=E2=80=99t happen=C2=BB.
I=E2=80=99v= e even seen a ticket in our repo regarding this = misuse.
Let=E2=80=99s just leave an `assert(0)` in this = branch, unreachable is defined like that anyway.

Here are my changes:

diff --git a/src/box/wal.c = b/src/box/wal.c
index f8ee2b7d8..a87aedf1d 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -955,14 +955,16 @@ wal_assign_lsn(struct vclock = *vclock_diff, struct vclock *base,
  = if (diff <=3D vclock_get(vclock_diff,
  =        (*row)->replica_id)) {
 = say_crit("Attempt to write a = broken LSN to WAL:"
- =  " replica id: %d, committed lsn: %d,"
+ =  " replica id: %d, confirmed lsn: %d,"
 =  " new lsn %d", = (*row)->replica_id,
  =  vclock_get(base, (*row)->replica_id) +
 = =  vclock_get(vclock_diff,
  =     (*row)->replica_id),
 =   =   (*row)->lsn);
+ = assert(0);
+ } else {
+ = vclock_follow(vclock_diff, (*row)->replica_id, = diff);
  }
- = vclock_follow(vclock_diff, (*row)->replica_id, = diff);
  }
 
}
 }

--
Serge = Petrenko

= --Apple-Mail=_D359E63A-1393-439E-9590-46A2C6FE1B2F--